Find us on Google+ How To Install Hadoop (Single Node Cluster) On Ubuntu - Invatac - Technology Blog

Pages - Menu

Friday, 22 March 2013

How To Install Hadoop (Single Node Cluster) On Ubuntu


This tutorial will help you to set up (install) Hadoop Single Node Cluster on your MAC OSx, Ubuntu , Linux Systems. Hadoop is a Library coded in Java for running applications on large clusters of computers. Using Hadoop, we can generalize a particular type of missions using a simple programming models and that can be used for a large cluster of hardware. It is similar to Google File System. As of Google File System, using hadoop we can run the same program in single machine to thousands of machine. Distributed Processing is the main concept of Hadoop. It is a project developed by Apache and it is moving in great success. This tutorial is written by Unmesha Sreeveni U B.




Hadoop Installation on Ubuntu
Hadoop Installation on Ubuntu

This tutorial is mainly concentrated on Hadoop Installation on Ubuntu and Linux Systems and MAC Osx. This tutorial is written by my Sister Unmesha Sreeveni U B. Thank you for your hard work. :)


How To Install Hadoop On Ubuntu ?


Prerequisite :


  • Any of the Java 1.5 + Versions

( You can install JDK using the below command on your Ubuntu Terminal)




$ sudo apt-get install openjdk-7-jre


After installing, you can find the installed Java in the location : "/usr/lib/jvm/java-6-sun"



  • SSH Configuration


Hadoop requires SSH access to manage its nodes.so we need to configure ssh.




unmesha:~ root# ssh localhost
Password:
   If you cannot ssh to localhost without a passphrase, execute the following commands:

unmesha:~ root# ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /var/root/.ssh/id_dsa.
Your public key has been saved in /var/root/.ssh/id_dsa.pub.
The key fingerprint is:
d2:00:0d:f7:6d:82:fe:87:f8:1b:fb:41:a8:13:83:53 root@unmesha.local
The key's randomart image is:
+--[ DSA 1024]----+
|    oo.          |
|     o.o .       |
|      E o o      |
|     + o +       |
|    o = S .      |
|     . B o       |
|      + + o      |
|       o + .     |
|        +o.      |
+-----------------+
unmesha:~ root# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 Try to ssh again  to your localhost and check if you are still getting a password prompt .
unmesha:~ root# ssh localhost
Password:
Password:
Password:
Permission denied (publickey,keyboard-interactive).


 Then do
unmesha:~ root# chmod go-w $HOME $HOME/.ssh
unmesha:~ root# chmod 600 $HOME/.ssh/authorized_keys
unmesha:~ root# chown `whoami` $HOME/.ssh/authorized_keys
unmesha:~ root# ssh localhost
Last login: Thu Mar  1 08:52:51 2013 from localhost



Hadoop Single Node Cluster Installation


Download the latest version of Hadoop from Here , UNTAR the downloaded file to your Home Folder.


unmesha:~ invatac$ sudo -i
    Password:
    unmesha:~ root$ tar xvfz hadoop-1.0.4.tar.gz 
unmesha:~ root# cd hadoop-1.0.4

      If u type ls command you can see list of  files and directories in hadoop.

unmesha:hadoop-1.0.4 root# ls
-input     c++    ivy
-mapper     conf    ivy.xml
-output     contrib   lib
.profile     docs    libexec
CHANGES.txt    hadoop-ant-1.0.4.jar logs
LICENSE.txt    hadoop-client-1.0.4.jar  sbin
NOTICE.txt    hadoop-core-1.0.4.jar   share
README.txt    hadoop-examples-1.0.4.jar  src
TEMPORARY-DIR-FOR-HADOOPDATASTORE hadoop-minicluster-1.0.4.jar  webapps
bin     hadoop-test-1.0.4.jar
build.xml    hadoop-tools-1.0.4.jar



Now we have extracted the Hadoop Files into the home folder.


Update Bashrc File

Now we have to update the ./bashrc file

Add the following lines to Home/bashrc file : 

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin


Now We have to edit the hadoop-env.sh file in the hadoop/conf folder. There we have to set the java home path. Add the following line to hadoop/conf/hadoop-env.sh file.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun



Configuration



  • conf/core-site.xml

Amrita-2:conf root#vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. ?
<configuration>
          <property>
                   <name>hadoop.tmp.dir</name>
                   <value>/Users/amrita/desktop/hadoop/temp</value>
          </property>
          <property>
                   <name>fs.default.name</name>
                   <value>hdfs://localhost:54310</value>
          </property>
</configuration>

Here you can see hadoop.tmp.dir which is  set to /Users/invatac/desktop/hadoop/temp. This is optional . By default hadoop creates tmp folder. If you are not creating a temprary folder you can ignore that.




<configuration>
 <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:54310</value>
   </property>
</configuration>

or if you want to create one (better u create one ).Create a temp directory anywhere. I created tmp in my local FS.

unmesha: hadoop invatac$mkdir temp
unmesha: hadoop invatac$pwd
/Users/invatac/desktop/hadoop/temp
unmesha: hadoop invatac$chmod 750 /Users/invatac/desktop/hadoop/temp




  • conf/mapred-site.xml





unmesha:conf root#  vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->

<configuration>
   <property>
        <name>mapred.job.tracker</name>
       <value>localhost:54311</value>
 </property>
</configuration>




  • conf/mapred-site.xml



You can edit the file using following command (if you are using vim editor otherwise go to the conf folder in hadoop and edit using TextEditor ) : unmesha:conf root# vi hdfs-site.xml




<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
</configuration>



  • Formatting the HDFS filesystem via the NameNode (using the following first line command "namenode -format"




unmesha:hadoop-1.0.4 root# bin/hadoop namenode -format

You will see some result like this 


13/03/06 15:26:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = unmesha.local/192.168.171.60
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
13/03/06 15:26:25 INFO util.GSet: VM type       = 64-bit
13/03/06 15:26:25 INFO util.GSet: 2% max memory = 19.83375 MB
13/03/06 15:26:25 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/03/06 15:26:25 INFO util.GSet: recommended=2097152, actual=2097152
13/03/06 15:26:25 INFO namenode.FSNamesystem: fsOwner=root
13/03/06 15:26:25 INFO namenode.FSNamesystem: supergroup=supergroup
13/03/06 15:26:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/03/06 15:26:25 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/03/06 15:26:25 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/03/06 15:26:25 INFO namenode.NameNode: Caching file names occuring more than 10 times 
13/03/06 15:26:26 INFO common.Storage: Image file of size 110 saved in 0 seconds.
13/03/06 15:26:26 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
13/03/06 15:26:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at unmesha.local/192.168.171.60
************************************************************/


Now we are almost finished our installation, lets start the Single Node Cluster using the following command

Starting your single-node cluster


Run the command:

unmesha:hadoop-1.0.4 root# bin/start-all.sh

Once we run this command all the daemon will be loaded. To know the active deamons run jps command.

unmesha:hadoop-1.0.4 root# jps



2329 NameNode
2126 JobTracker
2225 TaskTracker
1725 DataNode
1826 SecondaryNameNode
2496 Jps


If you are not seeing anyone of these 6 items then some thing went wrong. Try to format namenode once again or else try

unmesha:hadoop-1.0.4 root# bin/start-dfs

That will do the work done. Thats it you have just installed hadoop single node cluster on your Ubuntu System. Thanks to Sis Unmesha Sreeveni U B for this tutorial.......

Rahul Babu RAbout Me
I am a computer student looking for make my own name in the field of technology. Please add my to your circles.




No comments:

Post a Comment