Map Reduce: Understanding the reduce function and its limitations. What happens with the processing of first element?

When we have a dataset its often necessary to reduce this dataset to single values that correspond to indicators, for instance, normally related to statistics indicators. Some indicators examples are the average, the variance and the standard deviation.

The goal of the reduce function is to assist us to calculate such indicators.  To use the reduce function we have to import that from functools module:

from functools import reduce

 Now to use the reduce function we need to understand its interface, it has two parameters:

a)    a function which responsible for reducing. It needs to have two parameters – the first one a counter (variable shared among the processing of all iterated elements) and the second current element. The function needs to return the updated value of counter, which will be de input of the next iterating element.

b)    a list of elements to be reduced.

Here a example of how to use the reduce function, which will sum all elements of a certain list.

items = [1, 2, 3, 4, 5]

sum = reduce((lambda x, y: x + y), items)

If you want to understand deeper how it works, we could implement an example which will print each step of the process, as the example below:

def product(counter, element):
   
print ("element", element, "counter", counter)
   
return counter*element
product = reduce(product
, items)

 

The output of this processing will print the following:

·        element 2 counter 1

·        element 3 counter 2

·        element 4 counter 6

·        element 5 counter 24

What we can identify here? The first element is not processed… It is just added as input for the processing of the second element. So, for situations in which all elements has to be processed before aggregated to counter, including the first one, the reduce function is not suitable for this purpose. It is the example for the variance calculation. To suppress this limitation we can implement that not using the reduce, but using the “foreach” resource of python programming language.

#variance
print("####VARIANCE####")
counter=
0
for element in items:
    counter+=((element-average)**
2)/(length-1)
print("VARIANCE",counter)
print("STANDARD DEVIATION",math.sqrt(counter))

 

The complete code is available to be downloaded here:

https://github.com/rafaelqg/code/blob/main/map_reduce_reduce_example.py

 You may see a video class about this video here: 



Comments

Popular posts from this blog

HASHLIB: Using HASH functions MD5 and SHA256

Spread Operator

Dart: implementing Object Oriented Programming (OOP)