This post is also available in: Italian
The first company that we met on the fourth day was Qubole, a company in the Big Data arena.
Qubole was founded in 2011 by Ashish Thusoo and Joydeep Sen Sarma, both were formerly senior big data engineers at Facebook, were they created the social network’s big data infrastructure and Apache Hive (a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis). The headquartered is in Mountain View, California and there is also an office in Bangalore, India.
The company provide a self-service platform for Big Data Analytics as a Service and in the home page they declare to have already processes more than 480.000,00 TB of data.
The Qubole’s big data platform is an autoscaling Hadoop environment (as a service) that can auto-scale according with the computing needs, initially based only on AWS, but now available also on Google platform or Microsoft Azure. The platform is also OpenStack compliance.
All the storage part is not saved on Hadoop HDFS, but on native cloud storage (on Amazon is used S3), all is encrypted and store in the standard format Apache Hive (so it’s portable also to other platforms). Data could be imported from existing source with JDBC or ODBC connectors or other specific connectors (there is also one for Cloudera).
Qubole data service provides several benefits:
- Better Utilization: An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing — realize significant cost savings (as much as 50%-60% in total) while accomplishing tasks faster.
- Instant Deployment: Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.
- Easy to Use: Provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure—tools that are so intuitive, anyone can use them.
Also it’s fully integrated with several tools and service, like an intuitive GUI: QDS comes with a graphical user interface for scheduling jobs, a query editor, a visual query builder, and other tools to make your job easier and more productive. Its award — winning data pipeline creator simplifies setting up sophisticated data ingestion and workflows, including data dependencies.
The Qubole Data Workbench provide some interesting features like:
- Web Accessible Interfaces that can be used by a variety of data users in the enterprise
- Data governance and sharing features to control who can and who cannot access analysis
- Ability to create analytics templates for non technical users
- Ability to monitor and create complex data transformations
In the final part, Shrikanth Shankar (VP Engineering) as provide us some interesting numbers: Qubole is already processing 86PB per month on its clusters and biggest cluster has 1800 nodes.
Pricing is public available on their site and maybe is not for all companies, but considering that you save this like a real “Solution as a Service” you can save a lot of time and costs compared to an on-premises infrastructure. Also it’s not only an Hadoop as a Service, but it provides several other services and tools full integrated.
For more information see also those other posts/articles:
- Silicon Valley 2014 : Qubole, les bénéfices de Hadoop mais dans le cloud (French)
- Silicon Valley – QuBole, Big Data as a Service (French)
Disclaimer: I’ve been invited to this event by Condor Consulting Group and they have paid for accommodation and travels, but I’m not compensated for my time and I’m not obliged to blog. Furthermore, the content is not reviewed, approved or published by any other person than me.