Map Reduce: Understanding the reduce function and its limitations. What happens with the processing of first element?
When we have a dataset its often necessary to reduce this dataset to single values that correspond to indicators, for instance, normally related to statistics indicators. Some indicators examples are the average, the variance and the standard deviation.
The goal of
the reduce function is to assist us to calculate such indicators. To use the reduce function we have to import
that from functools module:
from functools
import reduce
a) a function which responsible for reducing. It needs to have two parameters – the first one a counter (variable shared among the processing of all iterated elements) and the second current element. The function needs to return the updated value of counter, which will be de input of the next iterating element.
b) a list of elements to be reduced.
Here a example
of how to use the reduce function, which will sum all elements of a certain
list.
items = [1, 2, 3, 4, 5]
sum = reduce((lambda x, y: x + y), items)
If you want to understand deeper how it works, we could implement an example which will print each step of the process, as the example below:
def product(counter, element):
print
("element", element, "counter", counter)
return
counter*element
product = reduce(product, items)
The output
of this processing will print the following:
· element 2 counter 1
· element 3 counter 2
· element 4 counter 6
· element 5 counter 24
What we can
identify here? The first element is not processed… It is just added as input
for the processing of the second element. So, for situations in which all
elements has to be processed before aggregated to counter, including the first
one, the reduce function is not suitable for this purpose. It is the example for
the variance calculation. To suppress this limitation we can implement that not
using the reduce, but using the “foreach” resource of python programming language.
#variance
print("####VARIANCE####")
counter=0
for element
in items:
counter+=((element-average)**2)/(length-1)
print("VARIANCE",counter)
print("STANDARD DEVIATION",math.sqrt(counter))
The complete
code is available to be downloaded here:
https://github.com/rafaelqg/code/blob/main/map_reduce_reduce_example.py
Comments
Post a Comment