This post is also available in: Italian

Reading Time: 5 minutes

During the last IT Press Tour (the 14th), in the Silicon Valley (December 1-5, 2014) we met several companies in different categories (Cloud, Storage and Big Data).

The second company that we met on the fourth day was Elasticsearch, another company that work with the OpenSource model with an interesting solution for the analytics of Big Data.

ITPressTour-LogoShay Banon (Elasticsearch Creator, Co-Founder, and CTO), Gaurav Gupta (VP Product Management) and Aaron Katz (SVP of Worldwide Field Operations) have explain the company origin and their vision and mission.

Elasticsearch is an Open Source distributed real time search engine that can be used to improve analytics aspects on big data.

Elasticsearch is Open Source under the Apache 2 License. The code is hosted at GitHub and they receive contributions from the company but also from a large community, whether they are bug reports, feature requests, document improvements or patches to Elasticsearch itself (see How to contribute to Elasticsearch). Elasticsearch is rapidly becoming the world’s most popular open source search solution.

The company was founded in 2012 by the people behind Elasticsearch and Apache Lucene and was initially based in Amsterdam (Holland) and build from several European people (see the board and the team). Using the OpenSource model the company provide reliable and scalable support and professional services: to give their customers unmatched support, the core engineering team that built and maintains the open source search engine is the same team training and consulting customers in various phases of the development and deployment cycles.

They provided the Elasticsearch ELK Stack: an end-to-end stack that delivers actionable insights in real-time from almost any type of structured and unstructured data source. This stack is composed by different products:

  • Elasticsearch: a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search. Through its robust set of APIs and query DSLs, plus clients for the most popular programming languages, Elasticsearch delivers on the near limitless promises of search technology.
  • Logstash: helps take logs and other time based event data from any system and store it in a single place for additional transformation and processing. Logstash will scrub your logs and parse all data sources into an easy to read JSON format. The most popular open source logging solution in the market today, Logstash lets users get up and running in just minutes.
    There is also an Hadoop Connector in order to retrive data directly from an Hadoop environment.
  • Kibana: is Elasticsearch’s data visualization engine, allowing you to natively interact with all your data in Elasticsearch via custom dashboards. Kibana’s dynamic dashboard panels are savable, shareable and exportable, displaying changes to queries into Elasticsearch in real-time. You can perform data analysis in Kibana’s beautiful user interface using pre-designed dashboards or update these dashboards in real-time for on-the-fly data analysis.

But also the company provide non OpenSource products , like marvel a monitor and management plugin that instantly provides much-needed visibility into a deployment, both in real time and historically, making it simple to understand the root cause of any issue with your Elasticsearch clusters. And the company itself provide not only support, but also training and the organization of events like elastic{on} (the 2015 edition will be on March 9-11 at San Francisco).

Primary user cases of this product are:

  • Internal and application search (like Facebook)
  • Logging Solution
  • Analytics

The customers list is quite long and include interesting names, but of course some of them are mainly using the OpenSource and free edition.

But why this products is so interesting? First, using the OpenSource and community model is becoming a standard de facto and it’s really used in several other products or solutions or services (for example the Azure Search is based on elasticsearch). Second, it could be implemented on premises (probably the common implementation), but also on the cloud (using Amazon or Azure as platforms). Third it moves the search concepts outside the simple needs of search and retrieve data, but to a Business Intelligence model and approach.

So not only have simple data results (faster and precise), but also found aggregated or correlated data… or (using an engine based on artificial intelligence model) found the most strange, curios, specific data. This can drammatically simplify the common infrastructure used in data analytics that is normally based on several systems: a search engine, a BI engine, a machine learning engine, … with several batch to feed all of them.

The interesting aspect is that the search engine maintain the requirement to answer in around 10-100 milliseconds, no matter on how huge is the cluster or the data source (also if it is composed by 100 nodes or 100 PB of data)!

In order to reach those results, data must be first de-normalized by creating also more copy or, at least, more redoundancy, but in this way data could be spread on more nodes.

Disclaimer: I’ve been invited to this event by Condor Consulting Group and they have paid for accommodation and travels, but I’m not compensated for my time and I’m not obliged to blog. Furthermore, the content is not reviewed, approved or published by any other person than me.

Share

Virtualization, Cloud and Storage Architect. Tech Field delegate. VMUG IT Co-Founder and board member. VMware VMTN Moderator and vExpert 2010-24. Dell TechCenter Rockstar 2014-15. Microsoft MVP 2014-16. Veeam Vanguard 2015-23. Nutanix NTC 2014-20. Several certifications including: VCDX-DCV, VCP-DCV/DT/Cloud, VCAP-DCA/DCD/CIA/CID/DTA/DTD, MCSA, MCSE, MCITP, CCA, NPP.