Hadoop – the solution for deciphering the avalanche of Big Data – has come a long way from the time Google published its paper on Google File System in 2003 and MapReduce in 2004. It created waves with its scale-out and not scale-up strategy. Inroads from Doug Cutting and team at Yahoo and Apache Hadoop project resulted in popularizing MapReduce programming – which is intensive in I/O and is constrained in interactive analysis and graphics support. This paved the way for further evolving of Hadoop 1 to Hadoop 2. The following table describes the major differences between them:
Sl No |
Hadoop 1 |
Hadoop 2 |
1 |
Supports MapReduce (MR) processing model only. Does not support non MR tools |
Allows to work in MR as well as other distributed computing models like Spark, Hama, Giraph, Message Passing Interface) MPI & HBase coprocessors. |
2 |
MR does both processing and cluster-resource management. |
YARN (Yet Another Resource Negotiator) does cluster resource management and processing is done using different processing models. |
3 |
Has limited scaling of nodes. Limited to 4000 nodes per cluster |
Has better scalability. Scalable up to 10000 nodes per cluster |
4 |
Works on concepts of slots – slots can run either a Map task or a Reduce task only. |
Works on concepts of containers. Using containers can run generic tasks. |
5 |
A single Namenode to manage the entire namespace. |
Multiple Namenode servers manage multiple namespace. |
6 |
Has Single-Point-of-Failure (SPOF) – because of single Namenode- and in case of Namenode failure, needs manual intervention to overcome. |
Has feature to overcome SPOF with a standby Namenode and in case of Namenode failure, it is configured for automatic recovery. |
7 |
MR API is compatible with Hadoop 1x. A program written in Hadoop1 executes in Hadoop1x without any additional files. |
MR API requires additional files for a program written in Hadoop1x to execute in Hadoop2x. |
8 |
Has a limitation to serve as a platform for event processing, streaming and real time operations. |
Can serve as a platform for a wide variety of data analytics-possible to run event processing, streaming and real time operations. |
9 |
A Namenode failure affects the stack. |
The Hadoop stack – Hive, Pig, HBase etc. are all equipped to handle Namenode failure. |
10 |
Does not support Microsoft Windows |
Added support for Microsoft windows |