For one JVM (Isolated Process)there will be
Job Tracker – one(Controller and scheduler)
Task Tracker – One per Cluster(Monitors task)

The Map Reduce consist of Two Parts

The Map Part
The Reduce Part

Map Part

  1. Function in java which perform some action in some data.The Map reduce is run as a job.During this run of Map Reduce as a job the Java function gets called in each Node where the data lives.
  2. The Map Reduce runs 3 Nodes (default HDFS cluster is replicated 3 Times).
  3. HDFS is self healing.If one goes down other will be used
  4. Once the MapReduce is run the output will be pairs
  5. The second part is the Reduce Part in the pairs

2 Versions of Map Reduce

Map Reduce Version 1

  1. As given by Google
  2. HDFS Triple Replicated
  3. Parallel Processing via Map and Reduce(aggregated)

Coding Steps

  1. Create a Class
  2. Create a static Map class
  3. Create a static Reduce class
  4. Create a Main Function
    1. Create a Job
    2. Job calls the Map and Reduce Classes

Java Coding for MapReduce

  public class MapReduce{
    public static void Main(String[] args)
    {
      //Create Job Runner Instance
      //Call MapInstance on Job Instance
      //Call ReduceInstance on Job Instance
        
    } 
    
    public void Map()
    {
       //write Mapper
    }

    public void Reduce()
    {
       //write Reducer
    }     
  }
  1. In MapReduce the States should not be Shared
  2. Top Down Programming, One Entry Point – One Exit Point

Aspects of MapReduce

  1. Job – Unit of MapReduce
  2. Map Task runs on each node
  3. Reduce Task – runs on some nodes
  4. Source date – HDFS or other location(amazon s3)

In Java while transferring data over network we serialize and deserialize values for security purposes.In MapReduce the Map output is serialized and the input is deserialized in Reduce.Serialized and Deserialized values are called as Writables in MapReduce. To acheive this String in java is replaced with Text and int in Java is replaced with IntWritable which does the serialization on it own.

Hadoop – Map Reduce

>> hadoop jar MapReduce.jar T1 output

hadoop jar MapReduce.jar InputFile OutputFolder

Comments are closed.