Manipulating Search Results

You may have observed that search queries return the results from the original documents specified with the _source field. Occasionally, we may want to fetch only a subset of fields. For example, we may need just the title and the rating of a movie when a user searches for a certain type of rating, or we might not need the document sent out in the response by the engine. Elasticsearch lets us manipulate the response, whether fetching selected fields or suppressing the whole document.

Suppress the full document

To suppress the document returned in the search response, we simply need to set the flag  _source to false in the query. The following listing returns the response with just the metadata.

GET movies/_search
{
"_source": false,
"query": {
"match": {
"certificate": "R"
}
}
}

The response, as shown in this code snippet, shows no mention of the original document at all:

"hits" : [
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.58394784
},
{
"_index" : "movies",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.58394784
},
...
]

We can of course fetch a nominal set of fields if the intention is to not fetch the whole document but just a few selected fields. Let’s see how we can do this in this section.

Elasticsearch provides a fields object to indicate which fields are expected to be returned. We define the fields explicitly in this object. For example, the query in the following code snippet fetches only the title and rating fields in the response.

GET movies/_search
{
"_source": false,
"query": {
"match": {
"certificate": "R"
}
},
"fields": [
"title",
"rating"
]
}

The following snippet displays the response. It shows the resorted document with only the title and rating fields as expected.

{
"_index" : "movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.58394784,
"fields" : {
"rating" : [
9.296875
],
"title" : [
"The Shawshank Redemption"
]
}
}

Note that each of the fields is returned as an array instead of as a single field. Because Elasticsearch doesn’t have an array type, it expects multiple values; hence, each of the fields is wrapped in an array.

You can also use wildcards in the field’s mapping. For example, setting title* retrieves title, title.original, title_long_descripion, title_code, and all other fields that have the title prefix. (We do not have all these fields in our mapping, other than title and title.original, so you can add them to the mapping to experiment with the wildcard setting.)

Scripted fields

We may at times need to compute a field on the fly and add it to the response. Say, for example, we want to set a movie as top rated if it falls within the highest ratings returned (say the rating is greater than say 9). For that, we can use scripting features when adding such ad-hoc fields on demand.

To use the scripting feature, append the query with the script_fields object at the same level with the required name of the new dynamic file and the logic to populate it. The following listing demonstrates this usage by creating a new field, top_rated_movie, by setting a flag based on the ratings the movie receives.

GET movies/_search
{
"_source": ["title*","synopsis", "rating"],
"query": {
"match": {
"certificate": "R"
}
},
"script_fields": {
"top_rated_movie": {
"script": {
"lang": "painless",
"source": "if (doc['rating'].value > 9.0) 'true'; else 'false'"
}
}
}
}

The script consists of the source element where the logic of populating the new field (top_rated_movie) is defined: we stamp the movie as top rated if the rating of the movie is greater than 9. Look at the output with the new top_rated_movie field given here:

"hits" : [{
...
"_source" : {
"rating" : "9.3",
"synopsis" : "Two imprisoned men bond ...",
"title" : "The Shawshank Redemption"
},
"fields" : {
"top_rated_movie" : ["true"]
}
}
...

Elasticsearch provides extensive support to create our own scripts, please check the documentation for further details.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action