Spring Hadoop an integration of the Spring Framework and the Apache Hadoop platform designed to give developers an option to build distributed processing solutions with Apache Hadoop. Wth lot of focus around cloud computing to leverage the real value of the infrastructure available at disposal has resulted in lot attention towards high performance computing.
Apache has provided the open stack format of Map Reduce Framewrok known as Hadoop. It provides:
- Hadoop Common: The common utilities that support the other Hadoop subprojects.
- HDFS: A distributed file system that provides high throughput access to application data.
- MapReduce: A software framework for distributed processing of large data sets on compute clusters.
- Avro: A data serialization system.
- Chukwa: A data collection system for managing large distributed systems.
- HBase: A scalable, distributed database that supports structured data storage for large tables.
- Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout: A Scalable machine learning and data mining library.
- Pig: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper: A high-performance coordination service for distributed applications.
Talking about the “new wave of data-driven applications” industry focus has been more on so-called Big Data issues, there is interest in VMware delivering a streamlined programming model that could make Spring a natural way to integrate Hadoop systems into the enterprise app landscape. “Spring Hadoop brings the benefits of Spring — simplicity, ease-of-use — to Hadoop by providing a comprehensive, lightweight framework that will allow developers to easily build solutions around the Hadoop platform,”.
The situation here is that of huge data volumes further “data access” choices in enterprise applications have grown exponentially (i.e., there is widespread secure access to the corporate data center via smartphones, tablets, laptops, and dedicated mobile devices of all kinds) — and this in many senses sums up the challenge brought about by Big Data as we know it today.
VMware has chosen spring to answer these new data challenges & Spring continues to focus on enabling enterprise Java developers to incorporate new data access patterns into their applications through the Spring Data projects.
Key aspects of Spring Hadoop include:
- Support for configuration, creation, and execution of MapReduce, Streaming, Hive, Pig, and Cascading jobs via the Spring container
- Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
- Declarative configuration support for HBase
- Dedicated Spring Batch support for developing powerful workflow solutions incorporating HDFS operations and all types of Hadoop jobs
- Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp
VMware is strategically promoting the integration of Spring & Hadoop in its virtualization solutions. We need to wait and watch how this combination will work in the real-time environments, but it is definitely gaining market attention as I have seen some of the ecommerce clients already adopting this solution.
1) Apache Hadoop , Spring Framework & MapReduce