Build a MapReduce flow in Elixir

Build a MapReduce flow in Elixir

6 years ago
Anonymous $roN-uuAfLt

https://hackernoon.com/build-a-mapreduce-flow-in-elixir-f97c317e457e

MapReduce is a common Big Data pattern for analyzing a data set concurrently. This tutorial will introduce you to Elixir and the principals behind Hadoop. We will be building the equivalent of Hello World in MapReduce which is a word count program. Map and Reduce are also common higher order functions in the world of functional programming. Map is a function that takes a list and an anonymous function or lambda as arguments, applies the function to each element in the list, and returns a new list with the output of the lambda on each element. Reduce is a similar function in that it takes the same arguments with one additional argument in Elixir, an accumulator, but returns an accumulated value instead of a list. Elixir is a great language to learn concurrency and MapReduce is both a useful example and shows off many of Elixir’s features.

MapReduce is a pipeline through which data flows and is processed. It can be broken down into roughly 5 steps which correspond to 5 modules we will write in Elixir. Our first step is the Input Reader. This takes in data, splits it into a form that our Map process can read, and concurrently launches Map processes. Our Map process reads the data given to it, runs a function on each piece of data, and outputs a key value pair to a Partition/Compare process. The Partition process accumulates key value pairs from all Map processes, compares the pairs, and spawns Reduce processes for each unique key. Each Reduce process runs a function on each value that adds up all the values for the given key, and emits these values to the Output Writer. Finally, the Output Writer yields your data in a format of your choice.