03 July 2015

10 Big Differences between Hadoop 1 & Hadoop 2

Hadoop – the solution for deciphering the avalanche of Big Data – has come a long way from the time Google published its paper on Google File System in 2003 and MapReduce in 2004. It created waves with its scale-out and not scale-up strategy. Inroads from Doug Cutting and team at Yahoo and Apache Hadoop project resulted in popularizing MapReduce programming – which is intensive in I/O and is constrained in interactive analysis and graphics support. This paved the way for further evolving of Hadoop 1 to Hadoop 2. The following table describes the major differences between them:

Sl No	Hadoop 1	Hadoop 2
1	Supports MapReduce (MR) processing model only. Does not support non MR tools	Allows to work in MR as well as other distributed computing models like Spark, Hama, Giraph, Message Passing Interface) MPI & HBase coprocessors.
2	MR does both processing and cluster-resource management.	YARN (Yet Another Resource Negotiator) does cluster resource management and processing is done using different processing models.
3	Has limited scaling of nodes. Limited to 4000 nodes per cluster	Has better scalability. Scalable up to 10000 nodes per cluster
4	Works on concepts of slots – slots can run either a Map task or a Reduce task only.	Works on concepts of containers. Using containers can run generic tasks.
5	A single Namenode to manage the entire namespace.	Multiple Namenode servers manage multiple namespace.
6	Has Single-Point-of-Failure (SPOF) – because of single Namenode- and in case of Namenode failure, needs manual intervention to overcome.	Has feature to overcome SPOF with a standby Namenode and in case of Namenode failure, it is configured for automatic recovery.
7	MR API is compatible with Hadoop 1x. A program written in Hadoop1 executes in Hadoop1x without any additional files.	MR API requires additional files for a program written in Hadoop1x to execute in Hadoop2x.
8	Has a limitation to serve as a platform for event processing, streaming and real time operations.	Can serve as a platform for a wide variety of data analytics-possible to run event processing, streaming and real time operations.
9	A Namenode failure affects the stack.	The Hadoop stack – Hive, Pig, HBase etc. are all equipped to handle Namenode failure.
10	Does not support Microsoft Windows	Added support for Microsoft windows

AcadGild

10 Big Differences between Hadoop 1 & Hadoop 2

Related

AcadGild

Related Posts

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop

Share this:

Related

AcadGild

Related Posts