Terms and Terms Lookup Queries

As the name suggests, the terms (note down the plural) query searches multiple criteria against a single field. That is, we can throw in all the possible values of the field that we would like the search to be performed. Say, we want to search for all movies with multiple content ratings, like PG-13, 15, 18, R-rated. We use terms query for this purpose as the following listing given below demonstrates.

GET movies/_search
{
"query": {
"terms": {
"certificate": ["PG-13","R"]
}
}
}

The terms query expects a list of search words to be queried against a field, passed in as an array to the terms object. The array values will be searched against the existing documents one by one to fetch matches. Each of the words are matched exactly. In the above listing, we are searching for all the movies with PG-13 and R ratings in the certificate field. The resultant documents would be a combination of all PG-13 and R movies together.

There’s a limit of how many terms we can set in that array — a whopping of 65,536 terms.

If you need to modify the terms limit (to increase or decrease it), you can use the index’s dynamic property setting to alter the limit: index.max_terms_count. The following query in the code listing below sets the max_terms_count to 10:

Listing for resetting the maximum terms count

PUT movies/_settings
{
"index":{
"max_terms_count":10
}
}

This setting will restrict a user not to set more than 10 values in the terms array. Remember this is a dynamic setting on the index, so you can change as and when you want it on a live index.

There’s a slightly different variant to terms query — a terms lookup query. The big idea is to create the terms array from an existing document’s values rather than setting it specifically by us. The next section discusses it in detail with an example.

Terms lookup

So far we were providing the list of values in an array as the search criteria to the terms query. The terms lookup query is a slight variant of terms query in that it lets the terms set by reading an existing document’s field values.

It is best understood with an example.

Let’s create a classic_movies index with two properties: title and director, as the listing below shows:

PUT classic_movies
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"director": {
"type": "keyword"
}
}
}
}

As the code illustrates, there is nothing special about this index — except that the notable point is that we are defining the director field as a keyword type — for no better reason than avoiding complexity.

Now that we have the index prepared, let’s index a few movies: The listing below shows the indexing of these new movies:

PUT classic_movies/_doc/1
{
"title":"Jaws",
"director":"Steven Spielberg"
}PUT classic_movies/_doc/2
{
"title":"Jaws II",
"director":"Jeannot Szwarc"
}PUT classic_movies/_doc/3
{
"title":"Ready Player One",
"director":"Steven Spielberg"
}

The documents are self explanatory.

Now that we have indexed these three documents, let’s go back to terms lookup query discussion. Say we wish to fetch all movies directed by the director Spielberg. However, we wouldn’t want to construct a terms query with the terms upfront, instead we will let the query know to pick up the values of the terms from a (another) document. That is, we let the terms query lookup the criteria from the field values of another document rather than providing them directly. The listing below does exactly this:

Listing for terms lookup search

GET classic_movies/_search
{
"query": {
"terms": {
"director": {
"index":"classic_movies",
"id":"3",
"path":"director"
}
}
}
}

The code listing requires a bit of explanation: we are creating a terms query with the director being the field against which multiple search terms are arranged. In a usual terms query, we would’ve provided an array with all the list of names. However, here we are asking the query to look up the values of the director from another document instead: the document with id as 3.

The document with this ID 3 is expected to be picked up from the classic_movies index as the index field mentions in the query. And of course the field to fetch the values is called director and is noted as path in the above code listing. Running this query will fetch two documents that were directed by Spielberg.

The terms lookup query helps build a query based on values obtained from another document than a set of values passed in the query. It has a greater flexibility when constructing the query’s terms: we could easily swap the index with any other index to pick a document from.

For example, say we have an index called movie_search_terms_index with a handful of documents of search terms (say document 1 with director terms, document 2 with actors terms and so on). We can then reference this document with director terms from movie_search_terms_index in our main query and fetch the results. This way, the main query can be a constant query while the lookup documents can be changed as and when required.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action