Monthly Archives: August 2016
MapReduce Program with Default Mapper and Reducer
Default Mapper and Reducer are from
import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;
when the Mapper and reducer are not set using job.setMapperClass()
and job.setReducerClass() then default Mapper.class and Reducer.class will be considered
The Mapper.class performs a word count on lines.The input and output of the default ampper and reducer are as shown.
Input
Test Test Test
Output
0 test 5 test 10 test
The Line is considered as a word test – 4 + Carriage Return 1 = 5
package com.mugilmapred; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class Test extends Configured implements Tool{ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Test objTest = new Test(); int result = ToolRunner.run(objTest, args); System.exit(result); } public int run(String[] args) throws Exception { // TODO Auto-generated method stub Job job = new Job(getConf()); job.setJarByClass(Test.class); Path inputFilepath = new Path(args[0]); Path outputFilepath = new Path(args[1]); FileInputFormat.addInputPath(job, inputFilepath); FileOutputFormat.setOutputPath(job, outputFilepath); FileSystem fs = FileSystem.newInstance(getConf()); if(fs.exists(outputFilepath)) { fs.delete(outputFilepath, true); } return job.waitForCompletion(true)? 0:1; } }
when you dont add set jar by class it will throw
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mugilmapred.Test$Map not found
If you run locally it wont expect this to be specified by when you run in local the class which contains the mapper should be specified else the system does not know in which jar file the mapper is located
job.setJarByClass(Test.class);
You can aslo use setJar as below
job.setJar("Test-0.0.1.jar");
Using a Predefined Reducer in Program
. . . job.setMapperClass(WordMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setReducerClass(LongSumReducer.class); job.setNumReduceTasks(1); . . . .
LongSumReducer.class takes input from mapper ([count,1] [count,1] [count,1] [count,1]) and group it together as ([count,4])
Hive Queries
Table Creation
CREATE TABLE HomeNeeds(Type STRING, Product STRING, No INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TextFile;
Insertion
LOAD DATA LOCAL INPATH '/home/turbo/workspace/Sample Datas/Test.csv' OVERWRITE INTO TABLE HomeNeeds;
Create Table with Partition
CREATE TABLE HomeNeeds(Type String, Product String, No Int) PARTITIONED BY (Date String, Country String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
The Partitioned columns and Table columns have no Relations with one another
Inserting into Partitioned Table
LOAD DATA LOCAL INPATH '/home/turbo/workspace/Sample Datas/Test.csv' INTO TABLE HomeNeeds PARTITION (Date='2001-01-25', Country='India');
Partition and Bucketing
CREATE TABLE HomeNeeds(Type String, Item String, No Int) PARTITIONED BY (Area String) CLUSTERED BY (Type) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
User Defined Function Pig
package com.mugil.pig; import java.io.IOException; import org.apache.pig.FilterFunc; import org.apache.pig.data.Tuple; public class FilterType extends FilterFunc { @Override public Boolean exec(Tuple tuple) throws IOException { if(tuple == null || tuple.size() == 0) return false; try { Object obj = tuple.get(0); if(obj == null) return false; String Type = (String)obj; if(Type.equals("Kitchen")) return true; } catch (Exception e) { throw new IOException("Caught exception processing input row " + e.getMessage(), e); } return false; } }
Registering UDF Function
grunt> REGISTER /usr/local/pig-0.15.0/FilterByType3.jar; grunt> DEFINE FilterType com.mugil.pig.FilterType(); grunt> filtered_records = FILTER records BY FilterType(Type); grunt> DUMP filtered_records;
Search
Binary search is faster then linear search if the collection is sorted and does not contains duplicated Values
public static void BinarySearch(int searchVal) { int lowerIndex = 0; int higherIndex = arrNumbers.length; int searchIndex = 0; while(lowerIndex < higherIndex) { int middleIndex = (lowerIndex + higherIndex)/2; if(searchVal < arrNumbers[middleIndex]) { higherIndex = middleIndex + 1; } else if(searchVal > arrNumbers[middleIndex]) { lowerIndex = middleIndex - 1; } else { searchIndex = middleIndex+1; System.out.println("The element is Found at Index " + searchIndex); return; } } }
Sorting
Bubble Sort
public void bubbleSort() { for (int i = arrNumbers.length-1; i>1 ; i--) { for (int j = 0; j < i; j++) { if(arrNumbers[j] > arrNumbers[j+1]) { swapValuesAtIndex(j, j+1); } /*IterationDisplay(arrNumbers, j);*/ } } }
Selection Sort
Selection sort works by dividing the list into 2 Parts. Sorted and Unsorted.Taking one element at a time as Minimum element it works by comparing the minimum element with other elements in the list.
public void selectionSort() { int minElement = 0; for (int i = 0; i< arrNumbers.length ; i++) { minElement = i; for (int j = i; j < arrNumbers.length; j++) { if(arrNumbers[minElement] > arrNumbers[j]) { minElement = j; } } swapValuesAtIndex(minElement, i); } }
Insertion Sort
Insertion sort is the best sorting method compared to others.The list is divided into sorted and unsorted portion. Once a no is selected for comparison it will not ened without placing the no at the correct location.
public void insertionSort() { for (int i = 1; i < arrNumbers.length; i++) { int j = i; int toCompare = arrNumbers[i]; //holds no to Insert - arrNumbers[j-1] while((j>0) && (arrNumbers[j-1] > toCompare)) { arrNumbers[j] = arrNumbers[j-1]; j--; } arrNumbers[j] = toCompare; } }
Algorithms FAQ
- Linear search is faster when searching for a element in a collection where the elements are duplicated and occurs multiple time. Binary Search is efficient when the collection elements are unique
How to Remove element from array
public static String[] removeElements(String[] input, String deleteMe) { List result = new LinkedList(); for(String item : input) if(!deleteMe.equals(item)) result.add(item); return result.toArray(input); }