Explanation of the Scores

Elasticsearch provides a mechanism to understand the makeup of relevancy scores. This mechanism tells us exactly how the engine calculates the score. This is achieved by using an explain flag on a search endpoint or an explainAPI. The explainAPI is also used to find out the reason why a document has or has not matched a query. In this article, we look at both these methods to understand their commonalities and subtle differences.

Explain flag

You may have noticed a positive number (a relevancy scoring value) in the results for some queries earlier. Although we know that value was computed and set by the engine, we didn’t know how it was computed. If we are curious about the calculation, we set the explainattribute to true in the query. Elasticsearch then returns the results with the details of how it arrived at that score. In other words, it provides us with an explanation about the logic and the calculations that were carried out by the engine behind the scenes.

The query in listing given below shows a matchquery. Because we want to get the details of how the scores were calculated, we’ve set explainto true.

Listing : Asking the engine to explain the score

GET movies/_search
{
"explain": true,
"_source": false,
"query": {
"match": {
"title": "Lord"
}
}
}

The explainattribute is set at the same level as the query object. The result of this query is interesting as figure below demonstrates.

Figure : The explanation for how Elasticsearch calculates the relevancy scoring

As the figure shows, the relevancy score is calculated by multiplying three components: the inverse document frequency (idf), the term frequency (tf), and the boost factor. Elasticsearch goes into detail about how it evaluates and measures each of these components. For example, the idf is computed as

log(1 + (N — n + 0.5) / (n + 0.5))

where

  • n is the total number of documents containing the term (in the figure, there are 3 documents containing the word lord).
  • N is the total number of documents (figure given above shows 25 documents in our index).

You can find out what is idf made of by looking at the description field in the return response (see the above figure).

Similarly, the term frequency (tf) is calculated using the formula

freq / (freq + k1 * (1 — b + b * dl / avgdl))

I recommend that you take a look at this section to check the application of the formulae by the engine to produce the score.

Explain API

Although we use the explainattribute to understand the mechanics of relevancy scoring, there’s also an explainAPI that provides insight into why a document matched (or not), in addition to providing the scoring calculations. The query in the following listing uses an _explainendpoint with a document ID as the parameter to demonstrate this approach.

GET movies/_explain/14
{
"query":{
"match": {
"title": "Lord"
}
}
}

This query is the same as the query in listing 8.20, but this time, we invoke the _explainendpoint instead of setting the explain flag on the _searchendpoint. The result in the previous listing provides an explanation about the scores.

Finally, let’s search for Lords (instead of Lord) in the above listing and rerun the query. Elasticsearch provides a clue as the following code snippet shows, as to why we didn’t get the same results

{
"_index" : "movies",
"_type" : "_doc",
"_id" : "14",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}

As the explanation object’s descriptionsays, Lords does not match the indexed data. Understanding the reasons for a match (or not a match) helps us troubleshoot the status of the queries (for example, in the previous example, we know that the matching term doesn’t exist in our index).

A search query built using the explainflag on the _searchAPI can produce a lot of results. Asking for an explanation of the scores for all documents at a query level is simply a waste of computing resources in my opinion. Instead, pick one of the documents and ask for an explanation using the _explainAPI

That’s a short explanation of what _explain and explain flag are all about!

Elasticsearch in Action