Map-Reduce seems to be the standard technology for working with large amounts of data these days. It is most well-known in combination with simple flat, table-like structures, maybe because most beginner tutorials focus on these.

However, using Map-Reduce, you can also do calculations on graphs. This is often done by employing message passing between the mapper and executing the map-reduce steps iteratively until some stop condition occurs. This can either be the length of the longest path in the graph (if known) or minimum a threshold the values must change for the iteration to continue.

The basic algorithm within the map and the reduce job is the same for many appliances. Might even be the same for all. In the mapper you first emit the node itself with it’s node id as the key. By emiting the node, you’ll be able to catch it in the reducer again. Then you iterate over all neighbouring nodes and calculate a message for each of them. You emit each neighbour’s node id together with this message.

The reducer catches these messages and the node itself. It then uses both of them to calculate the new state for the node it just received. It can now emit the node with the new state and this output can be fed into the mapper again.

Depending on the required appliance only the functions calc_message and calc_state have to be implemented. A quite simple example is the number of hops from one node in the graph to all other nodes. From the mapper you send a message to each neighbouring node with this node’s current distance plus one. In the reducer you check all messages a node received and take the minimum. Then you repeat from the beginning.

For a simple example we start with a graph and define node b as the start node. This start node must be initialized to distance 0, while all other nodes are initialized to infinity. The algorithm then starts. In the first iteration, we will emit the old node distances (states) as seen on the first image, but we will also emit the distance of each node + 1 as a message to its neighbours. Let’s check this in more detail for nodes b and f.

When we get b as an input into our mapper, we will emit the node b itself first. Afterwards we iterate over c and d, because they are the outgoing nodes, and emit the message 1 to each of them (current distance of b plus one).

f has one outgoing edge to node g. Of course we also emit the node f itself followed by a message for node g with f’s distance plus one. However, f already has a distance of inifinity, so we will just send the message infinity to g.

In the reducer phase for d we will now collect its messages and calculate the minimum of d’s current distance and all incoming distances and assign this as the new distance for d. Since d has an own distance of inifinity and an incoming message with distance 1, we will assign distance 1 to it. e on the other hand has a distance of infinity and an incoming message of infinity, so we will just assign infinity again.

The resulting state after one iteration is as follows: And the following two iterations will lead to these situations.

After 2 iterations: After 3 iterations: I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to stefan@eliteinformatiker.de.