Just Elasticsearch: 1/n. Introducing Elasticsearch

This is the first article of a series of articles explaining Elasticsearch as simple as possible with practical examples. The articles are condensed versions of the topics from my upcoming book Just Elasticsearch. My aim is to create a simple, straight-to-the-point, example-driven book that one should read over a weekend to get started on Elasticsearch.

All the code snippets, examples, and datasets related to this series of articles are available on my Github Wiki.

Introduction

Elasticsearch is a Search and Analytics engine.

It is an ultra-fast, highly available search engine built on a popular full-text library Apache Lucene. It is an open-source server developed in Java with near real-time search capabilities. 

Search and Analytics

Elasticsearch lets you search your data whatever the fashion you want to. The full-text searching is where Elasticsearch stands tall undoubtedly. It can retrieve the relevant documents for a user’s search criteria at an awesome speed. You can search for exact terms too, like keywords, dates, or range, similar to SQL world queries. 

Elasticsearch is packed with top-notch features such as relevancy, did-you-mean suggestions, auto-completion, fuzzy searching, geospatial searching, and many more. 

While the search is a predominant side of the coin, there’s another side: Analytics. It can aggregate data, create statistical calculations on it, find relations within the data as well as create fist class visualizations and dashboards within no time. It can help to find averages, summing up, mean/mode as well as other complex analytics including bucketing data like histograms and other analytical functions, all at an awesome speed.

APM and Machine Learning

Finally, we can put Elasticsearch to use in the area of application or infrastructure observability and application performance monitoring(APM). APM will let you monitor the infrastructure as well as your production applications, alerting when an unexpected event happens.

Elasticsearch will help to run unsupervised machine learning algorithms on your data to detect anomalies or find outliers. It also can forecast the events by watching over the data. 

Popular Search Engine

Elasticsearch has become a de-facto standard for search and analytics for enterprises. It is a very popular search engine amongst businesses, startups, open-source communities, and others. There is a growing list of organizations using Elasticsearch such as Netflix, GitHub, StackOverflow, Sky, Adobe, Rabobank, eBay, Cisco, lyft, and many more.

Getting Started 

I always believe in learning by getting hands dirty. Let’s do that in this first article. We will install the Elasticsearch server and index documents and search through them. We use Kibana, a UI tool to interact with the server. We will go over the architecture and moving parts in the next article.

Installing Elasticsearch and Kibana

Working with Elasticsearch requires few tools and frameworks including the Elasticsearch server itself. While the server is the heart and soul of the search and analytics, Kibana is a visual editor to visualize the data. It is a kaleidoscope into Elasticsearch’s vast data. 

Installing Elasticsearch is a breeze. Folks at Elastic have put in a lot of effort to get the products installed in whatever the fashion you want to – be it a binary, package manager, docker, or even in the cloud.

The easiest way to set up the Elasticsearch server on your computer is simply by downloading the compressed artifact and executing the script present inside the bin folder after uncompressing it. Of course, using Docker images is another option.

To save space here, I have documented the details on my Github Wiki page, so go over and follow the instructions to download both Elasticsearch and Kibana. Once you have the server and Kibana up and running, let’s start the play!

Indexing documents

Our first task is to get some data inserted into Elasticsearch. All the data that gets persisted to Elasticsearch must be provided as JSON Documents. We can represent any information as a Document, for example, COVID infections, Customer Orders, Fast Cars, Flight Reservations, Movies, Payments, Employees, and so on.

For this example, we use a set of Technical Books. So a Book document is represented with two fields title and author in JSON format is shown in the following snippet:

{

    "title":"Effective Java",

    "author":"Joshua Bloch"

}

Of course, similar to a table in a relational database, we need a collection to keep these books. This collection is called an index in Elasticsearch. All our books will be held in the books index.

Go to the Kibana dashboard at http://localhost:5601 and click on the Dev Tools button on the left-hand menu. Write the following code in the left side panel:

PUT books/_doc/1 

{

    "title":"Effective Java",

    "author":"Joshua Bloch"

}

We are using HTTP verb PUT to insert a document with ID 1 into the books index. The _doc is a type of document that you are trying to index (more about types in the coming chapters). 

Click on the green run button in the middle of the editor to execute the command. You’ll get a response that will be shown on the right-hand side panel:

Yay! We’ve indexed our first document! The response indicates the document of ID 1 was indexed to an index called “books” with a type of “_doc” and version 1.

You can follow the same process to index a few more documents:

// second document

PUT books/_doc/2

{

  "title": "Core Java Volume I - Fundamentals",

  "author": "Cay S. Horstmann",

}

// third document

PUT books/_doc/3

{

  "title": "Java: A Beginner’s Guide", 

  "author": "Herbert Schildt",

}

Now that we have a set of sample documents in our data store, the next step is to execute search queries. After all, the search is what Elasticsearch’s bread and butter, right?!.

Retrieving a document by ID

Let’s retrieve one of the documents we’ve inserted a minute ago. For this, we need to issue the following GET command providing the ID of the document:

//GET a document given an ID

GET books/_doc/1

The output will appear in the right-hand side pane of the Kibana console, as shown below:

// Output of a GET document request

{

  "_index" : "books",

  "_type" : "_doc",

  "_id" : "1",

  "_version" : 2,

  "_seq_no" : 9,

  "_primary_term" : 5,

  "found" : true,

  "_source" : {

    "title" : "Effective Java",

    "author" : "Joshua Bloch"

  }

}

Don’t worry about the fields in the response, just note that our document was returned under the _source tag.

Retrieving All Documents

To fetch all the documents present in our datastore, we need to use a _search endpoint with a request body formed out of a query object enclosing the match query:

// Fetch ALL documents from a given index

GET books/_search

{

  "query": {

    "match_all": {}

  }

}

The match_all will query for matching all the documents in the given index

Search Queries

We can also search for documents based on some search criteria, as we know Elasticsearch is good at accepting complex queries and returning results in a flash! Let’s look at a few examples of how we can create search queries.

Searching on Fields

Say, for example, we wish to find all the titles authored by Joshua. We construct the query using the same _search API, with the match query searching on an author field, as demonstrated below:

// Search for author as Joshua

GET books/_search

{

  "query": {

    "match": {

      "author": "Joshua"

    }

  }

}

We can search on full-text fields too. For example, we can search for two words “Java” and “Guide” which appears in the title field:

GET books/_search

{

  "query": {

    "match": {

      "title": "Java Guide"

    }

  }

}



// Returns the book. 

"_source" : {

  "title" : "Java: A Beginner’s Guide",

  "author" : "Herbert Schildt"

}

We barely scratched the surface, there are numerous ways of searching innumerable combinations of data in Elasticsearch. In the next set of articles, we will learn about full-text searching, term, range, fuzzy, highlighting, pagination, sorting and many more.

Aggregation Examples

Before we wind up, let’s briefly look at one or two aggregations using the updated data – I have added additional fields like rating, price, realease_year as a sample below suggests:

PUT books/_doc/1 

{

  "title":"Effective Java",

  "author":"Joshua Bloch",

  "rating":4.5,

  "release_date":"2017-10-17",

  "edition":4,

  "price_gbp":30.99,

  "price_usd":29.99

}

The following snippet will get the average rating across all our books:

GET books/_search

{

  "aggs": {

    "avg_rating": {

      "avg": {

        "field": "rating"

      }

    }

  }

}


//The output 

{

  // ...

  "aggregations" : {

    "avg_rating" : {

      "value" : 4.349999904632568

    }

  }

}

Summary

In this article, we have installed the Elasticsearch server and Kibana client for us to get started with learning the ropes of Elasticsearch. We learned the ways and means to start the server and client, primed our data store with sample data, retrieved using identifiers, and search queries. We also learned about searching and aggregating the data. 

All the code snippets and datasets are available at my GitHub Wiki

In the next article, we learn the Elasticsearch Architecture and moving parts. Stay tuned!