Reading Time: 6 minutes

Some weeks ago I was invited to as a delegate at the first edition of Data Field Day (#DFD1) event, scheduled in May 13–15, 2015 in California (Silicon Valley and San Francisco).

Data Field Day event purpose was to cover different topics from others Technical Field Day events: from big data to analytics to hyperscale architecture and cloud security, and all waves of innovations that are transforming the IT. A new format for some growing trends, including micro-services, mobile and the Internet-connected world.

As written in a previous post not all presentations where completely in topic: some where too much storage oriented. But for the first edition of #DFD1 this was acceptable. Cloudera was one of the cases where this issue was not present.

If you talk about Big Data you cannot mention Cloudera that not only one of the first Hadoop distribution, but it’s really strong (and the fact that has start early in the big data world will really help) in the services aspects (including Big Data as a Service offers on different platform included Azure), both for support and also for training/certification. I’ve met them previously in 2014 and I was really curious to listen something new and their point of view on how it is transforming.

First Mike Orlson (Cloudera Founder and Chief Strategy Officer) has start with a corporate overview presenting the history of the company, the market for big data and how Cloudera is driving big value with big data:

Cloudera is an American-based software company that provides Apache Hadoop-based software, support and services, and training to business customers. Was founded in 2008 with people from Yahoo!, Facebook, Google and Oracle and it’s also one of the largest contributor to the OpenSource Apache Hadoop ecosystem.

Talking about Hadoop is talking about the main OpenSource project for the big data world. This project includes these main modules:

  • Hadoop Common: The common utilities that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

And there are a lot of other modules. Everybody can start from all those OpenSource pieces and build its own big data platform. Or simple start from a complete distribution like CDH. CDH includes the core elements of Apache Hadoop plus several additional key open source projects that, when coupled with customer support, management, and governance through a Cloudera Enterprise subscription, can deliver an enterprise data hub.

CDH may be downloaded from Cloudera’s website at no charge, but with no technical support nor Cloudera Manager. Cloudera Manager is a management application for Apache Hadoop and the enterprise data hub. Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of the data hub — empowering operators to improve performance, enhance quality of service, increase compliance and reduce administrative costs.

Other options, using Cloudera solutions, could be use Cloudera Express or Cloudera Enterprise that combine CDH with other services and products.

Justin Erickson (Product Manager at Cloudera) has talk about the recent history and development of Hadoop in a really interesting session called “The Hadoop Journey”:

But, as writtent, the big value of Cloudera is not just build an Hadoop distribution, but be able to bring trust and visibility to Apache Hadoop data. Alex Gutow (Cloudera Product Marketing) has talk about Cloudera Enterprise with security, data governance, administration and management to meet business needs. She gave also an overview of Cloudera Navigator for data management and data lifecycle automation and also with a major focus on compliance level governance and protection with encryption:

But Cloudera is not only Hadoop products: it’s really strong (and the fact that has start early in the big data world will really help) in the services aspects, both for support and also for training/certification (for the different certification paths see the related VUE site). Considering that big data does not only mean store those data, this is a great value compared to other company more recently and younger.

And the second OpenSource company in history to reach $1B (first was Red Hat) demonstrating how could be possible build a successful OpenSource strategy. Considering that OpenSource is not a business model (or at least could be not sustainable itself), Cloudera build a distribution model (and a licensing model) to make it profitable and also to make their solution valuable for the most.

The OpenSource model was also one of the main theme of the Data Field Day 1 Delegate Roundtables (after Cloudera’s meeting):

Cloudera has also start new projects (like in October 2012, Cloudera announced the Cloudera Impala project, an open-source distributed and massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop), but also bought companies (like Xplain.io) in order to increase it’s solutions portfolio.

Finally Anupam Singh (co-founder of Xplain.io and Head of Data Management for Cloudera) has done a presentation and demonstration Xplain.io solution: an Enterprise Knowledge Hub that gives deeper insight into how Big Data is being used, the workloads, the queries being run, …

For more information see also:

Disclaimer: I’ve been invited to this event by Gestalt IT and they will paid for accommodation and travels, but I’m not compensated for my time and I’m not obliged to blog. Furthermore, the content is not reviewed, approved or published by any other person than me.

Share

Virtualization, Cloud and Storage Architect. Tech Field delegate. VMUG IT Co-Founder and board member. VMware VMTN Moderator and vExpert 2010-24. Dell TechCenter Rockstar 2014-15. Microsoft MVP 2014-16. Veeam Vanguard 2015-23. Nutanix NTC 2014-20. Several certifications including: VCDX-DCV, VCP-DCV/DT/Cloud, VCAP-DCA/DCD/CIA/CID/DTA/DTD, MCSA, MCSE, MCITP, CCA, NPP.