By using Partitioner we can group the output based on specific column.The Column based on which the output should be grouped is used for Partition.In below case I have used Second Value of TextPair key for grouping.

The Output of reducer will be equal to Hash Modulo Denominator

The Below Custom Partioner again makes use of HashCode and divides by the Total number of reducer.

PartitionValue = (HashCode Value of String x Max Val of Integer)/Total No of Reducers;

package com.mugil.part;

import org.apache.hadoop.mapreduce.Partitioner;

import com.mugil.avg.LongPair;
import com.mugil.avg.TextPair;

public class FirstPartioner extends Partitioner<TextPair, LongPair>
{

   @Override
   public int getPartition(TextPair arg0, LongPair arg1, int noOfReducers) 
   {
	int partitionValue = 0 ;		
	partitionValue = (arg0.getSecond().hashCode() & Integer.MAX_VALUE)%noOfReducers;		
	return partitionValue;
   } 
}

Comments are closed.