View list of Hadoop Files

>>hadoop fs -ls ..

Creating new Folder

>>hadoop fs -mkdir test

The above created file can be viewed in Hue

Adding Files to Hadoop File System

>>hadoop fs -put Test.txt test

Incase files need to be copied from more than one Directory use put command as Below

>>hadoop fs -put Test1 Test2 Test

Getting Files to Hadoop File System

>>hadoop fs -get Test.txt Test1

Deleting a File from Hadoop File System

>>hadoop fs -rm Test1/Test.txt

In the above case the file will be moved to the Trash

Deleting a File from Hadoop File System

>>hadoop fs -rm -skipTrash Test1/Test.txt

Deleting a File- Recursive Remove

>>hadoop fs -rmr -skipTrash Test1

View part of Data file

>> hadoop fs -cat /user/training/shakespeare.txt | tail -n5

Hadoop – Map Reduce

>> hadoop jar Test.jar T1 output

hadoop jar MapReduce.jar InputFile OutputFolder

Start hdfs daemons

>>  start-dfs.sh

Start MapReduce daemons:

>>  start-yarn.sh

Verify Hadoop daemons:

>>  jps

For one JVM (Isolated Process)there will be
Job Tracker – one(Controller and scheduler)
Task Tracker – One per Cluster(Monitors task)

The Map Reduce consist of Two Parts

The Map Part
The Reduce Part

Map Part

  1. Function in java which perform some action in some data.The Map reduce is run as a job.During this run of Map Reduce as a job the Java function gets called in each Node where the data lives.
  2. The Map Reduce runs 3 Nodes (default HDFS cluster is replicated 3 Times).
  3. HDFS is self healing.If one goes down other will be used
  4. Once the MapReduce is run the output will be pairs
  5. The second part is the Reduce Part in the pairs

2 Versions of Map Reduce

Map Reduce Version 1

  1. As given by Google
  2. HDFS Triple Replicated
  3. Parallel Processing via Map and Reduce(aggregated)

Coding Steps

  1. Create a Class
  2. Create a static Map class
  3. Create a static Reduce class
  4. Create a Main Function
    1. Create a Job
    2. Job calls the Map and Reduce Classes

Java Coding for MapReduce

  public class MapReduce{
    public static void Main(String[] args)
    {
      //Create Job Runner Instance
      //Call MapInstance on Job Instance
      //Call ReduceInstance on Job Instance
        
    } 
    
    public void Map()
    {
       //write Mapper
    }

    public void Reduce()
    {
       //write Reducer
    }     
  }
  1. In MapReduce the States should not be Shared
  2. Top Down Programming, One Entry Point – One Exit Point

Aspects of MapReduce

  1. Job – Unit of MapReduce
  2. Map Task runs on each node
  3. Reduce Task – runs on some nodes
  4. Source date – HDFS or other location(amazon s3)

In Java while transferring data over network we serialize and deserialize values for security purposes.In MapReduce the Map output is serialized and the input is deserialized in Reduce.Serialized and Deserialized values are called as Writables in MapReduce. To acheive this String in java is replaced with Text and int in Java is replaced with IntWritable which does the serialization on it own.

Hadoop – Map Reduce

>> hadoop jar MapReduce.jar T1 output

hadoop jar MapReduce.jar InputFile OutputFolder

An aspect is a software entity implementing a specific non-functional part of the application.

Using AOP has 2 Benefits

  1. The logic for each concern is now in one place, as opposed to being scattered all over the code base.
  2. Classes are cleaner since they only contain code for their primary concern (or core functionality) and secondary concerns have been moved to aspects.

OOP and AOP are not mutually exclusive. AOP can be good addition to OOP. AOP is especially handy for adding standard code like logging, performance tracking, etc. to methods without clogging up the method code with this standard code.

Assume you have a graphical class with many “set…()” methods. After each set method, the data of the graphics changed, thus the graphics changed and thus the graphics need to be updated on screen. Assume to repaint the graphics you must call “Display.update()”. The classical approach is to solve this by adding more code. At the end of each set method you write

 void set...(...) {
    :
    :
    Display.update();
}

If you have 3 set-methods, that is not a problem. If you have 200 (hypothetical), it’s getting real painful to add this everywhere. Also whenever you add a new set-method, you must be sure to not forget adding this to the end, otherwise you just created a bug.

AOP solves this without adding tons of code, instead you add an aspect:

after() : set() {
   Display.update();
}

And that’s it! Instead of writing the update code yourself, you just tell the system that after a set() pointcut has been reached, it must run this code and it will run this code. No need to update 200 methods, no need to make sure you don’t forget to add this code on a new set-method. Additionally you just need a pointcut:

pointcut set() : execution(* set*(*) ) && this(MyGraphicsClass) && within(com.company.*);

What does that mean? That means if a method is named “set*” (* means any name might follow after set), regardless of what the method returns (first asterisk) or what parameters it takes (third asterisk) and it is a method of MyGraphicsClass and this class is part of the package “com.company.*”, then this is a set() pointcut. And our first code says “after running any method that is a set pointcut, run the following code”.

See how AOP elegantly solves the problem here? Actually everything described here can be done at compile time. A AOP preprocessor can just modify your source (e.g. adding Display.update() to the end of every set-pointcut method) before even compiling the class itself.

However, this example also shows one of the big downsides of AOP. AOP is actually doing something that many programmers consider an “Anti-Pattern”. The exact pattern is called “Action at a distance”.

Action at a distance is an anti-pattern (a recognized common error) in which behavior in one part of a program varies wildly based on difficult or impossible to identify operations in another part of the program.

As a newbie to a project, I might just read the code of any set-method and consider it broken, as it seems to not update the display. I don’t see by just looking at the code of a set-method, that after it is executed, some other code will “magically” be executed to update the display. I consider this a serious downside! By making changes to a method, strange bugs might be introduced. Further understanding the code flow of code where certain things seem to work correctly, but are not obvious (as I said, they just magically work… somehow), is really hard.

OOP and AOP are not mutually exclusive. AOP can be good addition to OOP. AOP is especially handy for adding standard code like logging, performance tracking, etc. to methods without clogging up the method code with this standard code.

AOP addresses the problem of cross-cutting concerns, which would be any kind of code that is repeated in different methods and can’t normally be completely refactored into its own module, like with logging or verification

function mainProgram()
{  
   var x =  foo();
   doSomethingWith(x);
   return x;
 }

 aspect logging
 { 
    before (mainProgram is called):
    { 
       log.Write("entering mainProgram");
    }

     after (mainProgram is called):
    {  
      log.Write(  "exiting mainProgram with return value of "
                + mainProgram.returnValue);
    }
 } 

 aspect verification
 { 
  before (doSomethingWith is called):
  { 
       if (doSomethingWith.arguments[0] == null) 
       { 
         throw NullArgumentException();
       }

      if (!doSomethingWith.caller.isAuthenticated)
      { 
         throw Securityexception();
      }
    }
 }

And then an aspect-weaver is used to compile the code into this:

function mainProgram()
 { 
   log.Write("entering mainProgram");

   var x = foo();   

   if (x == null) throw NullArgumentException();
   if (!mainProgramIsAuthenticated()) throw Securityexception();
   doSomethingWith(x);   

   log.Write("exiting mainProgram with return value of "+ x);
   return x;
 }

Cross Cutting Concerns

  1. Database Access
  2. Data Entities
  3. Email/Notification
  4. Error Handling
  5. Logging

A Wrapper class is any class which “wraps” or “encapsulates” the functionality of another class or component.These are useful by providing a level of abstraction from the implementation of the underlying class or component.

A wrapper class is a class that “wraps” around something else, just like its name.

In general a wrapper is going to expand on what the wrappee does, without being concerned about the implementation of the wrappee, otherwise there’s no point of wrapping versus extending the wrapped class. A typical example is to add timing information or logging functionality around some other service interface, as opposed to adding it to every implementation of that interface.

For example
Wrapper classes provides a way to use the primitive types as objects. For each primitive , we have wrapper class such as for

int Integer
byte Byte 

Integer and Byte are the wrapper classes of primitive int and byte. There are times/restrictions when you need to use the primitives as objects so wrapper classes provide a mechanism called as boxing/unboxing.

Concept can be well understood by the following example as

double d = 135.0 d;

Double doubleWrapper = new Double(d);

int integerValue = doubleWrapper.intValue();
byte byteValue = doubleWrapper.byteValue();
sting stringValue = doubleWrapper.stringValue();

So this is the way , we can use wrapper class type to convert into other primitive types as well. This type of conversion is used when you need to convert a primitive type to object and use them to get other primitives as well.Though for this approach , you need to write a big code . However, the same can be achieved with the simple casting technique as code snippet can be achieved as below

double d = 135.0;
int integerValue = (int) d ;

In general a wrapper is going to expand on what the wrappee does, without being concerned about the implementation of the wrappee, otherwise there’s no point of wrapping versus extending the wrapped class. A typical example is to add timing information or logging functionality around some other service interface, as opposed to adding it to every implementation of that interface.

This then ends up being a typical example for Aspect programming. Rather than going through an interface function by function and adding boilerplate logging, in aspect programming you define a pointcut, which is a kind of regular expression for methods, and then declare methods that you want to have executed before, after or around all methods matching the pointcut. Its probably fair to say that aspect programming is a kind of use of the Decorator pattern, which wrapper classes can also be used for, but that both technologies have other uses.

A boilerplate is a unit of writing that can be reused over and over without change.

A pointcut is a set of join points.

“Reflection” is a language’s ability to inspect and dynamically call classes, methods, attributes, etc. at runtime.

For example, all objects in Java has the method getClass, which lets you determine its class even if you don’t know it at compile time (like if you declared it as Object) – this might seem trivial, but such reflection is not by default possible in less dynamic languages such as C++.

Why do we need reflection?

Reflection enables us to:

  1. Examine an object’s class at runtime
  2. Construct an object for a class at runtime
  3. Examine a class’s field and method at runtime
  4. Invoke any method of an object at runtime
  5. Change accessibility flag of Constructor, Method and Field
    etc.

Reflection is important since it lets you write programs that does not have to “know” everything at compile time, making them more dynamic, since they can be tied together at runtime. The code can be written against known interfaces, but the actual classes to be used can be instantiated using reflection from configuration files.

Lets see the simple example of forName() method.

    class Simple{}  
      
    class Test{  
     public static void main(String args[]){  
      Class c=Class.forName("Simple");  
      System.out.println(c.getName());  
     }  
    }  

For example, say you have an object of an unknown type in Java, and you would like to call a ‘doSomething’ method on it if one exists. Java’s static typing system isn’t really designed to support this unless the object conforms to a known interface, but using reflection, your code can look at the object and find out if it has a method called ‘doSomething’ and then call it if you want to.

So, to give you a code example of this in Java (imagine the object in question is foo) :

Method method = foo.getClass().getMethod("doSomething", null);
method.invoke(foo, null);

Getting list of methods in a Class by Reflection

Method[] methods = MyObject.class.getMethods();

for(Method method : methods){
    System.out.println("method = " + method.getName());
}

Details which can be accessed by reflection

  1. Class Name
  2. Class Modifies (public, private, synchronized etc.)
  3. Package Info
  4. Implemented Interfaces
  5. Superclass
  6. Constructors
  7. Fields
  8. Methods
  9. Annotations

Reflection allows programmer to access entities in program dynamically. i.e. while coding an application if programmer is unaware about a class or its methods, he can make use of such class dynamically (at run time) by using reflection.

It is frequently used in scenarios where a class name changes frequently. If such a situation arises, then it is complicated for the programmer to rewrite the application and change the name of the class again and again.

Drawbacks
Since everything is done at runtime optimizations can not be performed. Consequently, reflective operations have slower performance than their non-reflective counterparts, and should be avoided in sections of code which are called frequently in performance-sensitive applications.

Usage

  1. Reflection is used when it is needed to get into the other classes in deeper level. So in most of the cases, these implementors have the container-behavior. For instance, dependency injection is mostly done with the use of reflection
  2. Remote procedure calling — treat part of a message received over the network as a method name.
  3. Object-relational mappings — maintain a relationship between fields in an object and columns in a database.
  4. Interfaces with dynamically typed scripting languages — turn a string value produced by a scripting language into a reference to a field or method on an object.
  5. Serialization and deserialization — convert field names to string so you can write the object’s fields to a stream and later convert it back into an object.

One useful real-world use of reflection is when writing a framework that has to interoperate with user-defined classes, where th framework author doesn’t know what the members (or even the classes) will be. Reflection allows them to deal with any class without knowing it in advance. For instance, I don’t think it would be possible to write a complex aspect-oriented librory without reflection.

Servlet Mapping in web.xml and Junit annotations are other where reflection is used.

For example, JUnit use reflection to look through methods tagged with the @Test annotation, and then call those methods when running the unit test. (Here is a set of examples of how to use JUnit.)

For web frameworks, product developers define their own implementation of interfaces and classes and put is in the configuration files. Using reflection, it can quickly dynamically initialize the classes required.

For example, Spring uses bean configuration such as:

<bean id="someID" class="com.programcreek.Foo">
    <property name="someField" value="someValue" />
</bean>

When the Spring context processes this < bean > element, it will use Class.forName(String) with the argument “com.programcreek.Foo” to instantiate that Class. It will then again use reflection to get the appropriate setter for the < property > element and set its value to the specified value.

The same mechanism is also used for Servlet web applications:

<servlet>
    <servlet-name>someServlet</servlet-name>
    <servlet-class>com.programcreek.WhyReflectionServlet</servlet-class>
<servlet>

In linux there the file access is based under three categories

 CurrentUser - Users in Group - Other Users
  drwxr-xr-x 


drwxr – CurrentUser can read, write and execute
xr – Users in Group can read and execute
x – Users can execute

 >>ls -l
-rw-r--r-- 1 root root    0 Apr 14 17:19 test.txt 

The test.txt file has only Read and Write access (rw) for Current User – Read access for Users in Group – Read access for other Users

 >>chmod u+x test.txt

Now the test.txt file has Execute access along with Read and Write as below

 >>ls -l
-rwxr--r-- 1 root root    0 Apr 14 17:19 test.txt

Giving Read wrtite to Group

 >>chmod g+r test.txt

Giving read right to Current User, Group Users and Other Users

 >>chmod a+r test.txt

Giving Execute right to Other Users

 >>chmod o+x test.txt

rwx – 111 7
r-x – 101 5
rw- – 110 6
rw- – 101 4
-wx – 011 3
-w- – 010 2
–x – 001 1
– 000 0

Example of binary format

 >>chmod 752  test.txt

Current User can rwx
Group User can r-x
Other User can -w-

Giving Multiple Permission for Multiple User Types at Once

 >>chmod u+r, o+x test.txt

How folders are organized
when you enter terminal the default location it takes is the location of the folder of the current user which
comes under /home/mugil(user logged In)

How to Login as a Root User

>> su
Password:

R
Logout of Root User

>> exit

Current Working Directory

>> pwd

Navigate Backward

>> cd ..
>> cd ../..

Navigate Forward

>> cd home/mugil

Home Directory

>> cd ~

Equivalent to Microsoft User Directory i.e My Documents

How to reset admin password

>> sudo passwd

How to Find Directory Files info

>> ls -l

Touch- Creates a Time Stamp

>> touch test.txt

If the file does not exist it creates new once else it updates just the time

Make new Directory

>> mkdir NewFolder

Moving File within folder that exist

>> mv test.txt NewFolder/

Copying File within folder that exist
Copies test.txt to NewFolder

>> cp test.txt NewFolder/

Copies test.txt to one folder above

>> cp test.txt ..

Deleting File that exist

>> rm test.txt 

Deleting File in folder that is not empty
-r recursive

>> rm -r NewFolder/

What is Telescoping Constructor Pattern?
In Java, there is no support for default values for constructor parameters. As a workaround, a technique called “Telescoping constructor” is often used. A class has multiple constructors, where each constructor calls a more specific constructor in the hierarchy, which has more parameters than itself, providing default values for the extra parameters.

We’ve all at some point encountered a class with a list of constructors where each addition adds a new option parameter

Pizza(int size) { ... }        
Pizza(int size, boolean cheese) { ... }    
Pizza(int size, boolean cheese, boolean pepperoni) { ... }    
Pizza(int size, boolean cheese, boolean pepperoni, boolean bacon) { ... }

Disadvantage
This is called the Telescoping Constructor Pattern. The problem with this pattern is that once constructors are 4 or 5 parameters long it becomes difficult to remember the required order of the parameters as well as what particular constructor you might want in a given situation.

One alternative you have to the Telescoping Constructor Pattern is the JavaBean Pattern where you call a constructor with the mandatory parameters and then call any optional setters after:

Pizza pizza = new Pizza(12);
pizza.setCheese(true);
pizza.setPepperoni(true);
pizza.setBacon(true);

The problem here is that because the object is created over several calls it may be in an inconsistent state partway through its construction. This also requires a lot of extra effort to ensure thread safety.

The better alternative is to use the Builder Pattern.

public class Pizza {
  private int size;
  private boolean cheese;
  private boolean pepperoni;
  private boolean bacon;

  public static class Builder {
    //required
    private final int size;

    //optional
    private boolean cheese = false;
    private boolean pepperoni = false;
    private boolean bacon = false;

    public Builder(int size) {
      this.size = size;
    }

    public Builder cheese(boolean value) {
      cheese = value;
      return this;
    }

    public Builder pepperoni(boolean value) {
      pepperoni = value;
      return this;
    }

    public Builder bacon(boolean value) {
      bacon = value;
      return this;
    }

    public Pizza build() {
      return new Pizza(this);
    }
  }

  private Pizza(Builder builder) {
    size = builder.size;
    cheese = builder.cheese;
    pepperoni = builder.pepperoni;
    bacon = builder.bacon;
  }
}

Note that Pizza is immutable and that parameter values are all in a single location. Because the Builder’s setter methods return the Builder object they are able to be chained.

Pizza pizza = new Pizza.Builder(12)
                       .cheese(true)
                       .pepperoni(true)
                       .bacon(true)
                       .build();

This results in code that is easy to write and very easy to read and understand. In this example, the build method could be modified to check parameters after they have been copied from the builder to the Pizza object and throw an IllegalStateException if an invalid parameter value has been supplied. This pattern is flexible and it is easy to add more parameters to it in the future. It is really only useful if you are going to have more than 4 or 5 parameters for a constructor. That said, it might be worthwhile in the first place if you suspect you may be adding more parameters in the future.

Factory Patterns vs Builder Pattern
Consider a restaurant. The creation of “today’s meal” is a factory pattern, because you tell the kitchen “get me today’s meal” and the kitchen (factory) decides what object to generate, based on hidden criteria.

The builder appears if you order a custom pizza. In this case, the waiter tells the chef (builder) “I need a pizza; add cheese, onions and bacon to it!” Thus, the builder exposes the attributes the generated object should have, but hides how to set them.

 Calendar.getInstance().getActualMaximum(Calendar.DAY_OF_MONTH);

The above returns actual maximum for current month. For example it is February of leap year now, so it returns 29.

And to get last day as Date object:

Calendar cal = Calendar.getInstance();
cal.set(Calendar.DATE, cal.getActualMaximum(Calendar.DATE));

Date lastDayOfMonth = cal.getTime();

java beans = default constructor + getters + setters

You want to model a person and in your model each person must have a name and surname.
In java beans convention you would have to
1) create a person and then
2) populate it with name and surname.

But in between 1 and 2 you have existing object that has inconsistent state, its a person without a name. in this trivial example it looks as a exaggeration but if you have a complex system it starts to matter.

Let’s see simple example

class Person {
   private String firstName;
   private String lastName;
   public String getFirstName() {return firstName;}
   public void setFirstName(String firstName) {this.firstName = firstName;}
   public String getLastName() {return lastName;}
   public void setLastName(String lastName) {this.lastName = lastName;}
}

The code creates instance of Person an initiates it:

Person president = new Person();
p.setFirstName("George");
p.setLastName("Bush");

From the above line of code the below can be incurred

  1. This means that the object is in constant state when all 3 lines are completed and in not consistent state before that.
  2. The object is indeed mutable: values that it calls may be changed by invoking of setter.

Our class Person is not thread-safe and therefore we cannot use it directly in multi threaded environment without thinking about syncrhonization.

Here is an example. Several years ago Barak Obama became the President of the US. How can we express this in code?

p.setFirstName("Barak");
p.setLastName("Obama");

In mutlti threaded environment the president object is in wrong state when when setFristName() had already completed and setLastName() has not called yet because the object contains “Barak Bush” that is obviously wrong.

What is the solution? Let’s make `Person immutable:

class Person {
   private final String firstName;
   private final String lastName;
   Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
   }

   public String getFirstName() {return firstName;}
   public String getLastName() {return lastName;}
}

As you can see there is not way to change either first or last name stored in object. The fields a final and do not have setters. So, our example looks like:

Person president = new Person("George", "Bush"); 
// elections..... 
president = new Person("Barak", "Obama");

Since Person is immutable we cannot re-use the old instance of Person and change its attribute. We have to create the new instance instead. If president is volatile reference assignment is atomic and therefore the code is thread safe.

Disadvantage
The problem of constructors is that they are not flexible. Our example has only 2 parameters. But think about real world when class Person has probably 20 or more field. In this case creating of such object is pretty verbose.

Moreover some fields can be optional. In this case you will probably want to create several overloaded constructors with different number of parameters. To avoid dupplicate assignemnt code it is common technique to use so called telescopic constructors

Few FAQ’s
Is using JavaBeans for data storage a bad practice and should be avoided, or is it perfectly safe?
No, is not a bad practice. Is not perfectly safe either. Depends on the situation.

The problem with mutable objects ( not with JavaBeans per se ) is using different threads to access them.

You have to synchronize the access to avoid one thread modify the object while other is accessing it.

Immutable objects doesn’t have this problem, because, .. well they can’t change, and thus, you don’t have to synchronize anything.

To make sure an object is immutable you have to declare your attributes as final.

class MyBean  {
    private final int i;
}

If you want to assign a reasonable value to MyBean.i you have to specify it in the constructor:

public MyBean( int i ) {
     this.i = i;
 }

Since the variable is final, you can’t use a setter. You can just provide a getter.

This is perfectly thread-safe and the best is, you don’t have to synchronize the access, because if two threads try to get the value of i they both will always see the value that was assigned on instantiation, you don’t have to synchronize anything.

Is not bad practice or good practice. Must of us have to work with a single thread, even in multithread environments like servlets.

How to set and change values if Immutable Objects
The Solution is to create immutable beans, and still provide a
bunch of setters is using Builders like

Setting Value

Employee e = new EmployeeBuilder()
                  .setName("Oscar")
                  .setLastName("Reyes")
                  .setAge(0x1F)
                  .setEmployeeId("123forme")
                  .build(); 

Which looks pretty similar to the regular setXyz used in regular beans with the benefit of using immutable data.

If you need to change one value, you can use a class method

Employee e = Employee.withName( e, "Mr. Oscar");

Which takes the existing object, and copy all the values, and set’s a new one….

public static EmployeeWithName( Employee e , String newName ){
      return new Employee( newName, e.lastName, e.age, e.employeeId );
}

But again, in a single thread model is perfectly safe to use getters/setters.