This blog gives you information about Rack Awareness in Apache Hadoop. HDFS block placement uses rack awareness for fault tolerance by placing one block replica on a different rack. This ensures data availability during network switch failure or a partition within the cluster. Rack awareness is very helpful in making an appropriate replication factor. Configuring rack awareness provides the information to Hadoop as to which Datanode is on which rack.
Note: For rack awareness configuration, all the changes/modifications have to be made on NameNode (Masternode) only.
You can configure rack awareness in 3 steps:
1. Create a topology data file anywhere in Master node (NameNode)
1 |
vi topology.data |
Next, mention your slave nodes ( DataNodes) relative to their rack into topology.data-
1 2 3 |
192.168.10.101 /rack1 192.168.10.102 /rack2 192.168.10.103 /rack2 |
2. Create a topology.sh script file (Also called as rack awareness script file)
1 |
vi topology.sh |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
HADOOP_CONF=/home/hadoop/address of topology.data while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/topology.data result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/default/rack " else echo -n "$result " fi done |
3. Add this property into core-site.xml of Master node only
1 2 3 4 |
<property> <name>topology.script.file.name</name> <value>/home/hadoop/topology.sh</value> </property> |
Next, start your cluster.
1 |
start-dfs.sh |
Check the Hadoop admin report to see if the cluster is aware of the rack.
1 |
hadoop dfsadmin -report |
This cluster is now rack aware !
Hope this post was helpful in understanding about the Commissioning and Decommissioning of the datanodes in Hadoop.
In case of any queries, feel free to write to us at support@acadgild.com or comment below, and we will get back to you at the earliest.
Leave a Reply