Pagination, Sorting and Highlighting Features

Elasticsearch provides capabilities to add additional features such as pagination, sorting, field manipulation and others to the queries and its results.

  • Elasticsearch let’s us paginate results, say every page consists of a hundred documents instead of the default ten documents Elasticsearch returns
  • We can sort the documents based on one or more fields in addition to sorting on the document’s relevancy score.
  • There’s also a function to highlight the results with the search match words in the results.

Let’s look at them in the next few sections.

Pagination

More times than not, our query yields a lot of results, possibly hundreds or even thousands. Sending all the results to the client for a query in one go is asking for trouble because both the serverside and clientside needs more memory and processing capacity to deal with the additional data load.

Elasticsearch, by default, sends the top ten results, but we can change this number by setting the size parameter on the query, with a maximum set to 10,000. You can change this limit, and we will discuss that shortly. The query in the next listing sets size as 20, returning the top 20 results in one go.

GET movies/_search
{
"size": 20,
"query": {
"match_all": {}
}
}

Setting size to 20 returns the top 20 results. If you have an index with one million documents, setting size to 10,000 retrieves that many documents (ignore the performance considerations for a moment!).

In addition to the size parameter which helps batch up the results, Elasticsearch has another parameter — called from — to offset the results.

As being an offset, the from setting helps skipping a given number of results. For example, if the from is set to 200, the first two hundred results are ignored, and the results from 201 will be returned. The following listing shows the mechanism of how we can paginate the results by setting the size and fromattributes.

GET movies/_search
{
"size": 100,
"from": 3,
"query": {
"match_all": {}
}
}

In the listing, by setting the size to 100, we are fetching a set of 100 documents in every page, and we fetch results from fourth document (as from is set to 3).

If the result set is too large (more than 10 K), rather than working with the pagination using the size and from attributes, we need to work with the search_after attribute. Now let’s look at another common search feature, highlighting.

Highlighting

When we search for a keyword(s) on a website in our internet browser using Ctrl-F, we can see the results highlighted so they stand out. For example, see how the word dummy is highlighted in the figure below. Highlighting keywords in the results for your clients are engaging and visually appealing too.

Figure : An example of highlighted text.

In a Query DSL, we can add a highlight object at the same level as the top-level query object as demonstrated in this code snippet:

GET books/_search
{
"query": { ... },
"highlight": { ... }
}

The highlight object expects a fields block, which can have multiple fields that you want to emphasize in the results. We provide the individual fields to highlight in the query inside the fields object. For example,

GET books/_search
{
"query": { ... },
"highlight": {
"fields": {
"field1": {},
"field2": {}
}
}
}

When results are returned from the server, we can ask Elasticsearch to highlight the matches with its default settings by enclosing the matched text in emphasis tags (<em>match</em>). The code in the next listing creates a highlight object, indicating the text to highlight in the title field of the results.

GET movies/_search
{
"_source": false,
"query": {
"term": {
"title": {
"value": "godfather"
}
},
"highlight": {
"fields": {
"title": {}
}
}
}

The following code snippet shows how to highlight the word Godfather with the <em> (short for emphasis) tags. The source is suppressed in the results because we’ve set _source to false in the query.

{
...
"highlight" : { "title" : ["The <em>Godfather</em>"] }
},
{
...
"highlight" : { "title" : ["The <em>Godfather</em> II"]}
}

We use the <em> tags to show emphasis on the font in HTML-based browsers. We can also use custom tags. For example, this code snippet demonstrates the mechanism to create a pair of curly braces ({{ and }}) as a tag

...
"highlight": {
"pre_tags": "{{",
"post_tags": "}}",
"fields": {
"title": {}
}
}

which results in “The {{Godfather}}” (with curly braces as the highlights). Now that we know how to highlight our search results, let’s turn our attention to sorting the data.

Sorting

The results returned by the engine are sorted by default on the relevancy score (_score): the higher the score, the higher on the list that the engine returns. However, Elasticearch not only lets us manage the sort order of the relevancy score (from ascending to descending), we can also sort on other fields including multiple fields.

Sorting the results

To sort the results, we must provide a sort object at the same level as the query (see the following snippet). The sort object consists of an array of fields, where each field contains a few “tweakable” parameters.

GET movies/_search
{
"query": {
"match": {
"genre": "crime"
}
},
"sort": [
{ "rating" :{ "order": "desc" } }
]
}

Here, the results of the match query that searches for all the movies in the crime genre are sorted by the movie’s rating. The sort object defines the field (rating) and the order in which the results are expected to be sorted, descending order in this case.

Read my other article to understand more about sorting the results on relevancy score.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action