This post is also available in: Italian
The last company that we met on the second day was Cloudera, during a nice informal dinner.
Cloudera is an American-based software company that provides Apache Hadoop-based software, support and services, and training to business customers. Was founded in 2008 with people from Yahoo!, Facebook, Google and Oracle and it’s also one of the largest contributor to the OpenSource Apache Hadoop ecosystem.
In the introduction part, Tom Reilly (Cloudera CEO) has explain the company origin and their vision and mission.
Talking about Hadoop is talking about the main OpenSource project for the big data world. This project includes these main modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
And there are a lot of other modules. Everybody can start from all those OpenSource pieces and build its own big data platform. Or simple start from a complete distribution like CDH. CDH includes the core elements of Apache Hadoop plus several additional key open source projects that, when coupled with customer support, management, and governance through a Cloudera Enterprise subscription, can deliver an enterprise data hub.
CDH may be downloaded from Cloudera’s website at no charge, but with no technical support nor Cloudera Manager. Cloudera Manager is a management application for Apache Hadoop and the enterprise data hub. Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of the data hub — empowering operators to improve performance, enhance quality of service, increase compliance and reduce administrative costs.
But Cloudera is not only Hadoop products: it’s really strong (and the fact that has start early in the big data world will really help) in the services aspects, both for support and also for training/certification (for the different certification paths see the related VUE site). Considering that big data does not only mean store those data, this is a great value compared to other company more recently and younger.
And in October 2012, Cloudera announced the Cloudera Impala project, an open-source distributed and massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.
For more information see also:
Disclaimer: I’ve been invited to this event by Condor Consulting Group and they have paid for accommodation and travels, but I’m not compensated for my time and I’m not obliged to blog. Furthermore, the content is not reviewed, approved or published by any other person than me.