Search fundamentals

In this article, let’s look at the search API and learn about ways to invoke the engine to carry out search queries.

Elasticsearch exposes a search endpoint for communicating with it to execute search queries. Let’s look at the endpoint in detail.

Search endpoint

Elasticsearch provides RESTful APIs to query the data, a _search endpoint to be specific. We use GET/POST to invoke this endpoint, passing query parameters along with the request or with a request body. The first method of querying is called the URI request method, and the latter is Elasticsearch’s special domain query language known as Query DSL. Although both are useful in their own way, Query DSL is powerful and feature-rich.

Query DSL uses a RESTful API for communication with a request body, consisting of the query and other attributes that make up or supplement the query. Query DSL allows a search criteria wrapped in a JSON body to be sent along with the request URL to the server. The result is wrapped in a JSON object as well. We can provide a single query or combine multiple queries, depending on the requirement. Query DSL is also the mechanism to send aggregate queries to the engine.

The queries that we construct depend on what type of data we are searching. We’ll discuss structured and unstructured search queries in the next article. For now, there are two ways of accessing the search endpoint:

  • URI request — With this method, we pass the search query along with the endpoint as parameters to the query. For example, GET movies/_search?q=title:Godfather fetches all the movies matching the word Godfather in the title (The Godfather trilogy, indeed).
  • Query DSL — With this method, Elasticsearch implements a domain-specific language (DSL) for the search. The criteria is passed as a JSON object in the payload. An example of the same requirement to fetch all the movies with the word Godfather in the title field would be:
GET movies/_search
{
"query": {
"match": {
"title": "Godfather"
}
}
}

Query DSL is a first-class querying mechanism. It is easier to write complex criteria using Query DSL than with the URI request mechanism. When searching across multiple indices, we can use comma-separated index names like GET <index1>,<index2>,<index3>/_search, including wildcards.

Although we will look at these two methods in detail in the coming sections, we will work more extensively with Query DSL than with the URI request method, This is for various advantages that you will realize when you start experimenting and working with them.

Query DSL is the Swiss army knife equivalent when it comes to talking to Elasticsearch and is the preferred option for most situations. Elasticsearch developed this domain-specific language extensively to work for both search as well aggregations (analytics) with its engine. Everything and anything we want to ask Elasticsearch can be retrieved using Query DSL.

Query vs filter context

There’s another fundamental concept that we should understand: the execution context. Elasticsearch internally uses an execution context when running searches. This execution context can be either a filter context or query context. All queries issued to Elasticsearch are carried out in one of these contexts. Although we have no say in asking Elasticsearch to apply a certain type of context, it is our query that lets Elasticsearch decide on applying the appropriate context. Let’s see the query context in action.

Query context

We have used a match query to search for a set of documents that match the keywords with the field’s values. The code in the following listing is a simple match query that searches for the word Godfather in a title field.

GET movies/_search
{
"query": {
"match": {
"title": "Godfather"
}
}
}

This code returns our two Godfather movies as expected. However, if you look at the results closely in the following code snippet, each has an associated relevancy score:
"hits" : [{
...
"_score" : 2.879596
"_source" : {
"title" : "The Godfather"
...
}
},
{
...
"_score" : 2.261362
"_source" : {
"title" : "The Godfather: Part II"
...
}
}]

The code output indicates that the query was executed in a query context because the query searched not only if it matched a document, but also how well the document was matched. If you were wondering why the score on the second result (2.261362) is slightly lower than the score of the first one (2.879596), it’s due to the fact that the engine’s relevancy algorithm found the word Godfather in a title of two words (“the”, “godfather”), which ranks a match higher than in a title of four words (“the”,”godfather”,”part”,”III”).

The queries on full-text search fields run in a query context because they are expected to have a scoring associated with each of the matched documents.

Although fetching the result with a relevancy score is fine for most cases, there may be some use cases where we do not need to find out how well the document matched. Instead, all we may want to know is if there’s a match or not. This is where the filter context comes into play.

Filter context

Let’s rewrite the query from the listing above, but this time, wrapping our match query in a bool query with a filter clause. The following listing shows the filter query in action.

GET movies/_search{
"query": {
"bool": {
"filter": [{
"match": {
"title": "Godfather"
}
}
]
}
}
}

In this listing, the results do not have a score (score is set to 0.0) because Elasticsearch got a clue from our query that it must be run in a filter context. This means that if we are not interested in scoring the document, we can ask Elasticsearch to run the query in a filter context by wrapping the query in a filter clause.

The main benefit of running a query in this context is that, because there is no need to calculate scores for the returned search results, Elasticsearch can save on some computing cycles. Because these filter queries are more idempotent, Elasticsearch tries to cache these queries for better performance.

In addition to using the filter clause in a bool query, we can also use the must_not clause. Remember, though, the must_not clause is completely opposite to the must query’s intent. In addition to using the filter context in a bool query, we can also use it in a constant_score (the name is a giveaway!) query.

Filters in compound queries

The bool query we’ve used a moment ago is a compound query with a few clauses (must, must_not, should, and filter) to wrap leaf queries. In addition to the filter clause, the must_not clause gets executed in a filter context.

The constant_score query is another compound query where we can attach a query in a filter clause. The following query shows this in action:

GET movies/_search
{
"query": {
"constant_score": {
"filter": {
"match": {
"title": "Godfather"
}
}
}
}
}

Knowing the execution context gets you one step closer to understanding the inner workings of the Elasticsearch engine. It helps create performant queries because the additional effort of running the relevancy algorithm is not required.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action