10 December 2015

Running Mapreduce in Local Mode

In this blog we have explained in detail about how to run your mapreduce code locally in eclipse in any linux machine.

After reading this blog you can easily run your mapreduce codes in eclipse without starting any of your hadoop daemons.

Before getting started with the things let us learn something about local mode and cluster mode

Local Mode

Local mode means you are not connected to any other system or any other network,In local mode you need not to start your hadoop daemons also.You need not to store your files in hdfs,you can just specify your local file paths.

ClusterMode

Cluster is a collection of systems connected in a network ,cluster mode in the sense running your program in a distributed network which means a distributed collection of systems.Here you need to ensure that all your hadoop daemons are started and then you need to run your mapreduce application by building a jar file.

Running in clustermode is not recommended all the time because it wastes your HDFS space and decreases your cluster performance.Every time when you try to deploy your application in cluster mode,your hdfs takes atleast 128MB of spaces beacuse the default block size in Hadoop2.x is 128MB.

For Testing your MapReduce program you can deploy it in local mode rather than cluster mode.

Follow the below procedure to execute your Mapreduce programs locally in eclipse,this saves your hdfs memory and time to check your program

1.Open eclipse

2.Create a Java Project

3.Create a new package(optional)

4.Create a new class

5.Copy your program in to that class

You need to add dependencies for running in eclipse which means few more jars need to be configured in your libraries.