Apache Hadoop 2.2.0 is the GA release of Apache Hadoop 2.x.
Users are encouraged to immediately move to 2.2.0 since this release is significantly more stable and is guaranteed to remain compatible in terms of both APIs and protocols.
To recap, this release has a number of significant highlights compared to Hadoop 1.x:
- YARN – A general purpose resource management system for Hadoop to allow MapReduce and other other data processing frameworks and services
- High Availability for HDFS
- HDFS Federation
- HDFS Snapshots
- NFSv3 access to data in HDFS
- Support for running Hadoop on Microsoft Windows
- Binary Compatibility for MapReduce applications built on hadoop-1.x
- Substantial amount of integration testing with rest of projects in the ecosystem
A couple of important points to note while upgrading to hadoop-2.2.0:
- HDFS – The HDFS community decided to push the symlinks feature out to a future 2.3.0 release and is currently disabled.
- YARN/MapReduce – Users need to change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.