Elasticsearch provides a mechanism to understand the makeup of relevancy scores. This mechanism tells us exactly how the engine calculates the score. This is achieved by using an explain
flag on a search endpoint or an explain
API. The explain
API is also used to find out the reason why a document has or has not matched a query. In this article, we look at both these methods to understand their commonalities and subtle differences.
Explain flag
You may have noticed a positive number (a relevancy scoring value) in the results for some queries earlier. Although we know that value was computed and set by the engine, we didn’t know how it was computed. If we are curious about the calculation, we set the explain
attribute to true
in the query. Elasticsearch then returns the results with the details of how it arrived at that score. In other words, it provides us with an explanation about the logic and the calculations that were carried out by the engine behind the scenes.
The query in listing given below shows a match
query. Because we want to get the details of how the scores were calculated, we’ve set explain
to true
.
Listing : Asking the engine to explain the score
GET movies/_search
{
"explain": true,
"_source": false,
"query": {
"match": {
"title": "Lord"
}
}
}
The explain
attribute is set at the same level as the query object. The result of this query is interesting as figure below demonstrates.
As the figure shows, the relevancy score is calculated by multiplying three components: the inverse document frequency (idf), the term frequency (tf), and the boost factor. Elasticsearch goes into detail about how it evaluates and measures each of these components. For example, the idf is computed as
log(1 + (N — n + 0.5) / (n + 0.5))
where
- n is the total number of documents containing the term (in the figure, there are 3 documents containing the word lord).
- N is the total number of documents (figure given above shows 25 documents in our index).
You can find out what is idf made of by looking at the description field in the return response (see the above figure).
Similarly, the term frequency (tf) is calculated using the formula
freq / (freq + k1 * (1 — b + b * dl / avgdl))
I recommend that you take a look at this section to check the application of the formulae by the engine to produce the score.
Explain API
Although we use the explain
attribute to understand the mechanics of relevancy scoring, there’s also an explain
API that provides insight into why a document matched (or not), in addition to providing the scoring calculations. The query in the following listing uses an _explain
endpoint with a document ID as the parameter to demonstrate this approach.
GET movies/_explain/14
{
"query":{
"match": {
"title": "Lord"
}
}
}
This query is the same as the query in listing 8.20, but this time, we invoke the _explain
endpoint instead of setting the explain
flag on the _search
endpoint. The result in the previous listing provides an explanation about the scores.
Finally, let’s search for Lords (instead of Lord) in the above listing and rerun the query. Elasticsearch provides a clue as the following code snippet shows, as to why we didn’t get the same results
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "14",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}
As the explanation object’s description
says, Lords does not match the indexed data. Understanding the reasons for a match (or not a match) helps us troubleshoot the status of the queries (for example, in the previous example, we know that the matching term doesn’t exist in our index).
A search query built using the
explain
flag on the_search
API can produce a lot of results. Asking for an explanation of the scores for all documents at a query level is simply a waste of computing resources in my opinion. Instead, pick one of the documents and ask for an explanation using the_explain
API
That’s a short explanation of what _explain
and explain
flag are all about!