Shay Banon: ElasticSearch for Big Data and Analytics
Shay Banon is the author of ElasticSearch, an open-source, distributed search server, based on Lucene. He gave the following talk at Berlin Buzzwords 2012 (The conference of High Scalability), on June 5, 2012.
You can read the outline of the talk here.
ElasticSearch basic concepts
People use ElasticSearch (ES) mostly for full-text search, but it can be used to store large amount of data and use it for analytics. The question is always the following:
How does data flow?
Shay outlines the basic ES concepts we need to understand:
index: is a logical namespace which maps one ore more shards and can have zero or more replicas. It is like a database in RDBMS world, but much more.
shard: is a Lucene instance, a low-level worker unit, managed by ES.
replica: is the exact copy of a primary shard, and may be used to load-balance queries or make the system or to increase the failover capacity, when a node fails.
node: is a running instance of ES, which belongs to a cluster. It may host multiple shards and/or replicas for multiple indices.
As each shard has its cost, one need to plan ahead to design the types and numbers of indices, shards and replicas he is going to use. Fortunately, it is very easy to run capacity tests, measure the load and decide on the certain conditions.
Data flow examples
There are different design patterns for different use cases, but each of them is focusing on how we would like to move the data around.
- One index - sensible default if we are starting small.
- One index / user - if searches are user-centric, and it might be extended with routing and aliasing.
- Time-based index - e.g. one index for each day, week or month. It is easy to
last 3 monthsalias, old indices can be optimized, moved or deleted easily.
Shay outlines an example of a time-based event log with a few components and
categories. He uses it to demonstrate the effortless queries to slice and dice
the data. One can use these aggregations to create tables, graphs, or