Here we’ll discuss the pseudo-distributed mode Hadoop cluster setup on linux environment. We are using Hadoop 2.x for this.
- Java 7
- Adding a dedicated user
- Configuring ssh
Step 1: Install Java:
Step2 : Add a dedicated hadoop user
Though it is not mandatory,we create it for separating the Hadoop installation from other packages.
It will add hduser user in hadoop group.
Step 3: Install ssh:
Once it is installed, make sure ssh service is running.
Step4 : Configure ssh
Hadoop uses ssh to manages its nodes. So we need to make ssh running and configured for authentication
First generate an SSH key for hduser.
Once the key is generated, copy the public key to authorized keys.
Once key is copied, you can ssh to localhost and continue the Hadoop setup.
Step 4: Setup Hadoop cluster
Download a release from Apache download mirrors. And extract it into a folder i.e. ‘/usr/local/hadoop/’. Set JAVA_HOME and other Hadoop related environment variables in .bash_profile file of hduser.
Hadoop can run in 3 modes :
1. Single node cluster
2. Pseudo distributed mode
3. Fully distributed mode
Single Distributed Mode : All daemons run in non-distributed manner as a single java process. Local filesystem is used for data storage.
Pseudo Distributed Mode : Hadoop can also be run on single node in pseudo distributed mode where each daemon runs as a separate java process.
Fully Distributed Mode : Hadoop runs on multiple nodes in master slave architecture where each daemon runs as a separate java process.
Following are the minimal configuration you need to add in the configuration files to start a cluster.
For yarn daemons : etc/hadoop/mapred-site.xml
Once the above configuration is done, next step is to format the namenode.
To format the filesystem:
To start NameNode and DataNode daemon:
Browse the NameNode web interface. By default it is : http://localhost:50070
To start yarn daemons (Resource manager and Node manager), run following:
You can browse resource manager at http://localhost:8088
If you want to run all daemons together, you can run following:
Now your cluster is successfully started. You can see all Hadoop daemons running using jps command.
Now start writing Map Reduce job..!!!!!!!!!