Elasticsearch has a handful of advanced queries dedicated to serving specialized functions. For example, finding similarly looking documents using more_like_this
query or giving a few documents a bit more importance using pinned
query, and so on. We learn about these two queries in this short article.
Pinned query
You may have seen a few sponsored search results appearing at the top of the result set when querying your favorite e-commerce website such as Amazon. Suppose we want to implement such functionality in our application using Elasticsearch. Well, fret not; a pinned
query is at hand.
The pinned
query helps to add chosen documents to the result set so they appear at the top of the list. This happens by making their relevance scores higher than others. Let’s quickly look at an example query, given in the following listing, that demonstrates this functionality.
GET iphones/_search
{
"query": {
"pinned":{
"ids":["1","3"],
"organic":{
"match":{
"name":"iPhone 12"
}
}
}
}
}
The pinned
query in the listing has a few moving parts: organic
and ids
blocks.
Let’s look at the organic
block first. It is the query block that houses the search query; in this case, we are searching for iPhone 12 in our iphones index. This query ideally should return the two documents: iPhone 12 and iPhone 12 Mini.
However, when you run this query with the data (checkout my GitHub repo for the queries and data), you receive documents iPhone and iPhone 13 (these two are not iPhone 12!) in addition to iPhone 12 and iPhone 12 Mini.
The reason for this is the ids
field. This field encloses the additional documents that must be appended to the results and shown at the top of the list (the sponsored results), thus creating higher relevance scores synthetically.
The pinned
query helps add additional high-priority documents with the results sets. These documents trump others in the list position to create sponsored results.
You may be wondering if the pinned results have any scoring: can one or some of the pinned results be prioritized over the other(s)? Unfortunately, the answer is no. These documents are presented in the order of IDs as input by us in the query: "ids":["1", "3"]
, for example.
The other query we look here in this article is “more like this”, topic of the next section.
Looking at the More Like This (more_like_this) query
You may have noticed on Netflix or Amazon Prime Video (or one of your favorite streaming apps) showing you More Like This movies when you browse one of them. For example, figure 12.13 shows all More Like This movies when I visit Paddington 2.
One of the requirements for users is to search “similar” or “like ‘’ in some documents. For example, researching papers similar to COVID and SARS, or querying movies like The Godfather. Let’s jump right in to an example to understand the use case better.
Let’s say that we are collecting a list of profiles about some people. To create a set of profiles, we index sample documents into the profiles index as the code in the following listing demonstrates.
PUT profiles/_doc/1
{
"name":"John Smith",
"profile":"John Smith is a capable carpenter"
}
PUT profiles/_doc/2
{
"name":"John Smith Patterson",
"profile":"John Smith Patterson is a pretty plumber"
}
PUT profiles/_doc/3
{
"name":"Smith Sotherby",
"profile":"Smith Sotherby is a gentle painter"
}
PUT profiles/_doc/4
{
"name":"Frances Sotherby",
"profile":"Frances Sotherby is a gentleman"
}
There’s nothing surprising about these documents; they’re just profiles about a bunch of routine people. Now that we have these documents indexed, let’s find out how we can ask Elasticsearch to fetch documents that are similar to the text gentle painter or to capable carpenter or even retrieve documents with the similar name, Sotherby.
That’s exactly what the more_like_this
query helps us with. The next listing creates a query to search profiles more like Sotherby.
GET profiles/_search
{
"query": {
"more_like_this": {
"fields": ["name", "profile"],
"like": "Sotherby",
"min_term_freq": 1,
"max_query_terms": 12,
"min_doc_freq":1
}
}
}
The more_like_this
query accepts text in a like
parameter, where this input text is matched against the given fields mentioned in the fields
parameter. The query accepts a few tuning parameters such as minimum term and document frequency (min_term
) and the maximum number of terms (max_query_terms
) that the query should select. If we want to give the user a better experience when showing similar documents, the more_like_this
query is the right choice.
That’s pretty much about pinned
and more_like_this
queries.
Don’t clap 🙂
Or don’t follow me 🙂
These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.