Default Mapper and Reducer are from
import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;
when the Mapper and reducer are not set using job.setMapperClass()
and job.setReducerClass() then default Mapper.class and Reducer.class will be considered
The Mapper.class performs a word count on lines.The input and output of the default ampper and reducer are as shown.
Input
Test Test Test
Output
0 test 5 test 10 test
The Line is considered as a word test – 4 + Carriage Return 1 = 5
package com.mugilmapred; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class Test extends Configured implements Tool{ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Test objTest = new Test(); int result = ToolRunner.run(objTest, args); System.exit(result); } public int run(String[] args) throws Exception { // TODO Auto-generated method stub Job job = new Job(getConf()); job.setJarByClass(Test.class); Path inputFilepath = new Path(args[0]); Path outputFilepath = new Path(args[1]); FileInputFormat.addInputPath(job, inputFilepath); FileOutputFormat.setOutputPath(job, outputFilepath); FileSystem fs = FileSystem.newInstance(getConf()); if(fs.exists(outputFilepath)) { fs.delete(outputFilepath, true); } return job.waitForCompletion(true)? 0:1; } }
when you dont add set jar by class it will throw
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mugilmapred.Test$Map not found
If you run locally it wont expect this to be specified by when you run in local the class which contains the mapper should be specified else the system does not know in which jar file the mapper is located
job.setJarByClass(Test.class);
You can aslo use setJar as below
job.setJar("Test-0.0.1.jar");
Using a Predefined Reducer in Program
. . . job.setMapperClass(WordMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setReducerClass(LongSumReducer.class); job.setNumReduceTasks(1); . . . .
LongSumReducer.class takes input from mapper ([count,1] [count,1] [count,1] [count,1]) and group it together as ([count,4])