In this blog we have explained in detail about how to run your mapreduce code locally in eclipse in any linux machine.
After reading this blog you can easily run your mapreduce codes in eclipse without starting any of your hadoop daemons.
Before getting started with the things let us learn something about local mode and cluster mode
Local Mode
Local mode means you are not connected to any other system or any other network,In local mode you need not to start your hadoop daemons also.You need not to store your files in hdfs,you can just specify your local file paths.
ClusterMode
Cluster is a collection of systems connected in a network ,cluster mode in the sense running your program in a distributed network which means a distributed collection of systems.Here you need to ensure that all your hadoop daemons are started and then you need to run your mapreduce application by building a jar file.
Running in clustermode is not recommended all the time because it wastes your HDFS space and decreases your cluster performance.Every time when you try to deploy your application in cluster mode,your hdfs takes atleast 128MB of spaces beacuse the default block size in Hadoop2.x is 128MB.
For Testing your MapReduce program you can deploy it in local mode rather than cluster mode.
Follow the below procedure to execute your Mapreduce programs locally in eclipse,this saves your hdfs memory and time to check your program
1.Open eclipse
2.Create a Java Project
3.Create a new package(optional)
4.Create a new class
5.Copy your program in to that class
You need to add dependencies for running in eclipse which means few more jars need to be configured in your libraries.
- All the jars present in the lib folder of the common directory of hadoop.
- Hadoop common 1.2.1 jar(Need to be imported externally)
To add the jar files
Right click on the project–>Build Path–>Configure Build Path–>Libraries–>Add External Jars–>open your hadoop folder–>share–>hadoop–>common–>lib–>
Add all the jars in lib folder
Then you need to add another external jar for dependencies i.e., hadoop-core-1.2.1 jar
Download that jar file from the below link
https://drive.google.com/file/d/0ByJLBTmJojjzM2IwU1FPdmExLUE/view?usp=sharing
After downloading you need to add this jar in to your libraries.
Now you are ready to run your program in eclipse,
To run
Right click on the project–>Run as–>Run configurations–>main
In main you need to select your project and main class correctly
Then move into the Arguments tab
Here you need to give your input file path and output file path separated by Tab space
Now click on Run then your program will start running and you can track the status in console
after the whole process you can see that an output file will be created in your specified folder.
Inside that folder you can see a part file and a success file which indicates that you have executed your program successfully in eclipse locally.
href=”https://s3.amazonaws.com/acadgildsite/wordpress_images/bigdatadeveloper/RUNNING+MAPREDUCE+IN+LOCAL+MODE/hadoop+eclipse.png”>
With this approach
- you can test your MapReduce codes and make changes in the MapReduce code easily before deploying it in a cluster
- you can save your HDFS space
Leave a Reply