19 February 2016

Hadoop Multinode Cluster Configuration

In this blog we will describe the steps and required configurations for setting up a distributed multi-node Apache Hadoop cluster.

Prerequisites

1. Single node hadoop cluster

{If you have not configured single node hadoop cluster yet, then click below to configure single node Hadoop cluster first.}

How to install single node hadoop cluster

After configuring single node Hadoop cluster, make clone of your single node cluster to set-up multi-node Hadoop cluster.

Cloning steps-

a> Right click on your Masternode (single node cluster), you will get a screen like below-

b> Select clone option

c > give a new name to clone machine-

make sure you have clicked on Reintialize the MAC address of all network cards –

Note- [ Reinitialize the mac address while cloning. ]

d > select Full clone

Now click on clone option it will take some time to make a new virtual machine ( Datanode).

Repeat the same process to make second Datanode.

Note- [ Reinitialize the mac address while cloning. ]

2.Networking

Networking plays an important role here, before merging single node cluster into a multi node cluster we need to make sure that all the node pings each other( they need to be connected on the same network / hub or both the machines can speak to each other).

In this blog, Network configuration for Hadoop Clusters are following-

IP Address for Masternode (Namenode) is – 192.168.10.100

IP Address of Datanode 1 (slave node) – 192.168.10.101

IP Address of Datanode 2 (slave node) – 192.168.10.102

Check the communication between master and slaves-

Ping through IP address-

1	ping 192.168.10.101

1	ping 192.168.10.102

If they are connecting then ping through there hostname-

1	ping dn1.mycluster.com

1	ping dn2.mycluster.com

Note- Verify pinging from slave nodes also, to check whether they are able to communicate with Master node or not. If you are getting acknowledgement, then you are able to communicate.

c) Verify password less ssh login –

1	ssh dn1.mycluster.com

1	ssh dn2.mycluster.com

d) Stop iptables of each Node( Namenode, Datanode1, Datanode2)-

1	sudo service iptables stop

1	service iptables stop

Come to your Master node (Namenode)-

Namenode Configuration –

Before configuring Master node (Namenode), make sure you have configured /etc/hosts file.

To configure /etc/hosts file-

1	sudo vi /etc/hosts

192.168.10.100 namenode.mycluster.com

192.168.10.101 dn1.mycluster.com

192.168.10.102 dn2.mycluster.com

Now follow the steps to make changes on each machine (Nodes) –

These are the changes have to be made on Master node (Namenode)

1) login your Master node (Namenode) and move on hadoop directory to make changes-

1	cd hadoop-2.6.0/etc/hadoop/

2) open core-site.xml and modify copy the following –

1	vi core-site.xml

<name>fs.default.name</name>

<value>hdfs://namenode.mycluster.com:9000</value>

</property>

</configuration>

3) open hdfs-site.xml

1	vi hdfs-site.xml

<name>dfs.replication</name>

</property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/hadoop/namenode</value>

</property>

<name>dfs.block.size</name>

</property>

</configuration>

Note:– Here <value>/home/hadoop/hadoop/namenode</value> ,

/home/hadoop is the home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

4) open mapred-site.xml

1	vi mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

</configuration>

5) open yarn-site.xml and add these entries-

1	vi yarn-s ite.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>namenode.mycluster.com:8025</value>

</property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>namenode.mycluster.com:8030</value>

</property>

<name>yarn.resourcemanager.address</name>

<value>namenode.mycluster.com:8050</value>

</property>

</configuration>

see the screen-shot below-

6) Restart the ssh service by typing the below command.

1	sudo service sshd start

DataNode Configuration-

Before configuring Datanode make sure have configured /etc/hosts file.

To configure /etc/hosts file-

1	sudo vi /etc/hosts

192.168.10.100 namenode.mycluster.com

192.168.10.101 dn1.mycluster.com

192.168.10.102 dn2.mycluster.com

Follow the steps to update Datanode

1) Login to your Datanode and move on hadoop directory to make changes-

1	cd hadoop-2.6.0/etc/hadoop/

2) open core-site.xml and modify copy the following –

1	vi core-site.xml

<name>fs.default.name</name>

<value>hdfs://namenode.mycluster.com:9000</value>

</property>

</configuration>

3) open hdfs-site.xml

1	sudo vi hdfs-site.xml

<name>dfs.replication</name>

</property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop/datanode</value>

</property>

<name>dfs.block.size</name>

</property>

</configuration>

Note:– Here <value>/home/hadoop/hadoop/namenode</value> ,

/home/hadoop is home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

4) open yarn-site.xml

1	vi yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>namenode.mycluster.com:8025</value>

</property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>namenode.mycluster.com:8030</value>

</property>

<name>yarn.resourcemanager.address</name>

<value>namenode.mycluster.com:8050</value>

</property>

</configuration>

5) open mapred-site.xml

1	vi mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

</configuration>

6) Restart the ssh service by typing the below command.

1	sudo service sshd start

Note- Repeat the same steps for all DataNode Configuration.

Create /home/hadoop/hadoop/namenode directory to Master node (Namenode) and /home/hadoop/hadoop/datanode directory to both Datanodes(Slave Nodes)-

1	mkdir -p /home/hadoop/hadoop/namenode (on Master node only)

1	mkdir -p /home/hadoop/hadoop/datanode (on Salves Node only)

Note- If they are already exist then remove it and create new directories by above commands.

Login your Masternode (Namenode) and follow these steps to start your hadoop cluster-

To start all the daemons follow the below steps:

1) Format the NameNode first:

1	hadoop namenode -format

2) Starting dfs daemons in Namenode-

Starting NameNode:

Type the below command to start dfs daemons:-

1	./start-dfs.sh

3) type jps to see running daemons-

jps

4) start yarn and historyserver daemons-

1	start-yarn.sh

1	mr-jobhistory-daemon.sh start historyserver

jps

You can also use start-all.sh to start all daemons-

1	start-all.sh

Login to your data node and verify the running daemons-

jps

you can also check on another datanode-

here the screen shot, where you can see the running daemons on each Nodes-

6) Verify live slave nodes by hadoop dfsadmin report :-

1	hadoop dfsadmin -report

Now open your browser and copy below addresses into url bar-

1	192.168.10.100:50070

you can see a screen like that –

This is your GUI ( a webserver of hadoop) for hadoop cluster.

AcadGild

Hadoop Multinode Cluster Configuration

Prerequisites

2.Networking

Now follow the steps to make changes on each machine (Nodes) –

DataNode Configuration-

Follow the steps to update Datanode

Create /home/hadoop/hadoop/namenode directory to Master node (Namenode) and /home/hadoop/hadoop/datanode directory to both Datanodes(Slave Nodes)-

Login your Masternode (Namenode) and follow these steps to start your hadoop cluster-

Starting NameNode:

Login to your data node and verify the running daemons-

Through GUI you can easily manage your cluster.

Related

Onkar Singh

Related Posts

1 Comment

Leave a Reply

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop

Prerequisites

2.Networking

Now follow the steps to make changes on each machine (Nodes) –

DataNode Configuration-

Follow the steps to update Datanode

Create /home/hadoop/hadoop/namenode directory to Master node (Namenode) and /home/hadoop/hadoop/datanode directory to both Datanodes(Slave Nodes)-

Login your Masternode (Namenode) and follow these steps to start your hadoop cluster-

Starting NameNode:

Login to your data node and verify the running daemons-

Through GUI you can easily manage your cluster.

Share this:

Related

Onkar Singh

Related Posts

1 Comment

Leave a Reply