Site icon Technology and Trends

Joins using MapReduce Framework

There are 3 types of joins, Reduce-Side joins, Map-Side joins, and memory-backed Joins that can be used to join Tables in MapReduce.

Map Side Join

Joining at the map side performs the join before data reaches the map function. It expects a strong condition before joining data on the map side.

Reduce Side Join

A reduce side join occurs on the reducer side and is also called a Re-partitioned join or repartitioned sort-merge join. In fact, it is the most used join type in the MapReduce framework. This type of join would be performed at a reduce side and thus have to go through a sort and shuffle phase, which would incur network overhead.

Memory Backed Join

We use this join for small tables which can be fit in the memory of data nodes.

Among these, reduce side join is the efficient one as it joins the tables based on the key which are shuffled and sorted before going to the reducer. Hadoop sends identical keys to the same reducer, so by default, the data is organized for the joins.

Map side Join and its Advantage

Map-side join is a process where two data sets are joined by the mapper.

The advantages of using map side join in MapReduce are as follows:

Exit mobile version