At times we might want to query for words using a prefix, like Leo for Leonardo or Mar for Marlon Brando, Mark Hamill, or Martin Balsam. Elasticsearch provides a prefix
query for fetching records that match the beginning part of a word (a prefix). The prefix
query in the following listing does exactly that:
GET movies/_search
{
"query": {
"prefix": {
"actors.original": {
"value": "Mar"
}
}
}
}
The above query fetches three movies with the actors Marlon, Mark, and Martin when we search for the prefix Mar. Do note that we are running the prefix
query on actors.original field which keyword
datatype.
Note : Prefix query is an expensive query – The prefix query is also an expensive query that can destabilize the cluster at times.
We don’t need to add an object consisting of value at the field block level. Instead, you can create a shortened version as the next listing demonstrates for brevity:
Listing to shorten the prefix query usage
GET movies/_search
{
"query": {
"prefix": {
"actors.original": "Leo"
}
}
}
As we wish to find out the matching fields in the results, we will highlight the results by adding highlights to the query. We add a highlight
block to the prefix query. This accentuates one or more fields that match as the following listing shows.
Listing : Prefix search with highlighting
GET movies/_search
{
"_source": false,
"query": {
"prefix": {
"actors.original": {
"value": "Mar"
}
}
},
"highlight": {
"fields": {
"actors.original": {}
}
}
}
Because we don’t want the source to be returned in the response (“_source”: false
), the results in the following snippet highlight where the prefix matched with the word:
"hits" : [{
..
"highlight" : {
"actors.original" : ["<em>Marlon Brando</em>"]
}
},
{
..
"highlight" : {
"actors.original" : ["<em>Martin Balsam</em>"]
}
},
{
..
"highlight" : {
"actors.original" : ["<em>Mark Hamill</em>"]
}
}]
We discussed earlier that the prefix queries exert additional computation strain when running the queries. Fortunately, there is a way to speed up such painstakingly ill-performant prefix queries — discussed in the next section.
Speeding up prefix queries
This is because the engine has to derive the results based on a prefix (any lettered word). The prefix queries, hence, are slow to run, but there’s a mechanism to speed them up: using the index_prefixes
parameter on the field.
We can set the index_prefixes
parameter on the field when developing the mapping schema. For example, the mapping definition in the listing shown below sets the title
field (remember, the title
field is a text
data type) with an additional parameter, index_prefixes
, on a new index, boxoffice_hit_movies
, that we are creating for this exercise.
Listing : A new index with the index_prefixes parameter
PUT boxoffice_hit_movies
{
"mappings": {
"properties": {
"title":{
"type": "text",
"index_prefixes":{}
}
}
}
}
As you can see from the code in the listing, the sole title
property includes an additional property, index_prefixes
. This indicates to the engine that, during the indexing process, it should create the field with prebuilt prefixes and store those values. For example, when you index a new document as shown in the following snippet:
PUT boxoffice_hit_movies/_doc/1
{
"title":"Gladiator"
}
Because we set index_prefixes
on the title
field in listing shown above, Elasticsearch indexes the prefixes with a minimum character size of 2 and a maximum character size of 5 by default. This way, when we run the prefix query, it doesn’t need to calculate the prefixes. Instead, it picks them up from storage.
Of course, we can change the default min
and max
sizes of the prefixes that Elasticsearch tries to create during indexing for us. This is done by tweaking the sizes of the index_prefixes
object as the following listing demonstrates.
Listing : Custom character length settings for index_prefixes
PUT boxoffice_hit_movies_custom_prefix_sizes
{
"mappings": {
"properties": {
"title":{
"type": "text",
"index_prefixes":{
"min_chars":4,
"max_chars":10
}
}
}
}
}
In the listing, we ask the engine to precreate prefixes with a minimum and maximum character length of 4 and 10 letters, respectively. Note that min_chars
must be greater than 0, and max_chars
should be less than 20 characters. This way, we can customize the prefixes that Elasticsearch should create beforehand during the indexing process.
These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.