Percolator Queries

In this article, we explore all about percolate queries.

Searching for a set of documents given an input is straightforward. All we need to do is to return search results from the index if there are any matches to the given criteria. This satisfies the requirement of searching user’s criteria, and this is what we’ve done so far when querying for results.

There’s another requirement that Elasticsearch satisfies: the requirement of notifying the user when their present search yields negative results, but the outcome will become available at a future date. Say, for example, that a user searches for a Python in Action book on an e-commerce book seller site, but unfortunately, we do not have the book in stock. The dissatisfied customer leaves the site. However, after a day or two, we’d had got some new stock in, and the book is added to the inventory. Now, as the book re-appears in our inventory, we want to notify the user so the user can purchase it.

Elasticsearch supports this sort of use case by providing a special query called the percolate query, which uses the percolator field type.

The percolate query is opposite to our normal search query mechanism in that instead of running the query against the documents, we search for a query given a document.

This is a bit of a strange concept to understand at first glance, but we’ll demystify that in this section. The following figure shows the differences between the normal versus percolate query.

Figure 12.14 Normal vs. percolate query

Let’s check the percolate queries in action by first indexing a set of documents. The following listing indexes three technical books in to the tech_books index. Note that we do not include a Python book yet.

PUT tech_books/_doc/1
{
"name":"Effective Java",
"tags":["Java","Software engineering", "Programming"]
}

PUT tech_books/_doc/2
{
"name":"Elasticsearch crash course",
"tags":["Elasticsearch","Software engineering", "Programming"]
}

PUT tech_books/_doc/3
{
"name":"Java Core Fundamentals",
"tags":["Java","Software"]
}

Now that we’ve seeded our books’ inventory index with a few books, we can expect users to search for books using the simple match/term queries. (I’m omitting these queries in this discussion because we’ve already mastered them.) However, not all user queries yield results; for example, someone searching for the Python in Action book won’t find one. The following listing demonstrates this.

GET tech_books/_search
{
"query": {
"match": {
"name": "Python"
}
}
}

From the user’s perspective, the search ends with a disappointing result: not returning the book when searched. We can take our queries to the next level by notifying the user when the out-of-stock Python book becomes available. This is exactly where we can put percolators to work.

Just as we index documents into an index, percolators have their own index for a set of queries and expect the queries to be indexed. We need to define a schema for a percolator index. Let’s call it tech_books_percolator, which the following listing shows.

PUT tech_books_percolator
{
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"name": {
"type": "text"
},
"tags": {
"type": "text"
}
}
}
}

This listing defines the index for holding the percolator queries. A few things to note are

  • It contains a query field to hold the users’ (failed) queries.
  • The data type of the query field must be percolator.

The rest of the schema consists of the schema definition borrowed from the original tech_books index.

Just as we define the fields with various data types such as text, long, double, keyword, etc., Elasticsearch provides a percolator type too. The query field is defined as percolator in listing 12.30, and it expects a query as the field value, which we will see shortly.

Now that we have our percolator index (tech_books_percolator) mapping ready, the next step is to store queries. The queries are usually the ones that don’t return results to the users (like the Python example).

In the real world, the user’s query that doesn’t yield a result will be indexed into this percolator index. The process of collating the users’ failed queries into a percolator index can be done inside the search application, but unfortunately, it is out of scope to discuss it here. Just imagine that somehow the query in the above listing doesn’t yield a result and is now sent to the percolator’s index to get it stored. The next listing provides the code to store the query.

PUT tech_books_percolator/_doc/1
{
"query" : {
"match":{
"name":"Python"
}
}
}

As you can see, the listing shows the indexing of a query, which is unlike indexing a normal document. If you remember the document/indexing operations from the beginning chapters, any time we index a document, we use a JSON-formatted document with name-value pairs. This one, however, oddly has a match query.

This query is stored in the tech_books_percolator index with the document ID 1. As you can imagine, this index keeps growing with the failed searches. The only notable thing is that the JSON document consists of the query(ies) issued by the users that don’t return positive results.

The final piece of the puzzle is to search the tech_books_percolators index when our stock gets updated. As a bookshop owner, we are expected to stock up and, perhaps, the next time we receive the new stock, let’s just assume the Python book is available in the new stock. We now can index it in to our tech_books index for users to search and buy as this listing shows.

PUT tech_books/_doc/4
{
"name":"Python in Action",
"tags":["Python","Software Programming"]
}

Now that we have the Python book indexed, we need to rerun the user’s failed query. But this time instead of running the query on the tech_books index, let’s run it against the tech_books_percolator index. The query against the percolator index has a special syntax. Let’s first write that and then come back to it to discuss more about it.

GET tech_books_percolator/_search
{
"query": {
"percolate": { #A
"field": "query", #B
"document": { #C
"name":"Python in Action",
"tags":["Python","Software Programming"]
}
}
}
}

As the listing shows, the percolate query expects two bits of input: a field with a value of query (this coincides with the property defined in the percolator mapping earlier in the code listing) and a document, which is the same document we indexed in our tech_books index. All we need to do is to check if there’s any query that matches with the given document. Fortunately, Python in Action has a match (as you may recollect, we indexed a query in to our percolator index earlier).

Now that given a document (the Python document defined in listing 12.32), we can return a query from the tech_books_percolator index. This lets us inform the user that the book they were looking for is back in stock! Note that we could’ve extended the query that gets stored in the tech_books_percolator index with a specific user ID.

That’s all for percolators. They are a little bit difficult to understand because there are a couple of moving parts, but once you understand the use case, it’s not that difficult to implement it. Do keep in mind that there must always be an automate, semi-automated, or even manual process in place to sync the operations a user performs against the data stored in the percolator index.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action