Wildcard queries

The sourcecode for this (and other topics) code is available here on my GitHub repo. You can download the movies data and index them.

Wildcard queries let you search on words with missing characters, suffixes, and prefixes. At times, we want to use wildcards to do a search. For example, when searching for Godfather movie, all possible combinations of movies with titles ending with father or god, or even missing a single character like god?ather, are expected searches. This is where we use a wildcard query.

The wildcard query in Elasticsearch accepts an asterisk (*) or a question mark (?) in the search word. The following list describes these characters.

  • * (asterisk) — searching for zero or more characters
  • ? (question mark) — searching for a single character

Let’s search for documents where the movie title starts with “god”. The listing below shows the query in action:

GET movies/_search
{
"query": {
"wildcard": {
"title": {
"value": "god*"
}
}
}
}

We should see three movies (Godfather, Godfather II, and City of God) returned for this wildcard query. Of course, any movie (Godzilla, God’s Waiting List, etc.) can also be fetched because we expect all titles with a prefix of god.

However, if you omit “The” from “The God*” and run the query, you can’t find any results. The reason is that title.original is a keyword typed field and the value is persisted during index without any text analysis.

We can tweak our queries by placing wildcards anywhere in the word. For example, the query “g*d” fetches two movies from our stash: “The Good, the Bad and the Ugly” and “City of God”. If you want to find out the match of a given query criteria in a return document, you can use highlighting (we discussed highlighting in the last chapter). The following listing shows this approach

GET movies/_search
{
"_source": false,
"query": {
"wildcard": {
"title": {
"value": "g*d"
}
}
},
"highlight": {
"fields": {
"title": {}
}
}
}

The output of this program shows us that two movies matched. The following code snippet indicates this.

"title": [ "The <em>Good</em>, the Bad and the Ugly" ]
"title": [ "City of <em>God</em>"]

We use the ? wildcard only if we want to match one character; for example, “value”: “go?ather” searches for all the words that match the third character at the wildcard’s position. Of course, you can club multiple ? characters if you want; for example, “g???ather”.

Expensive queries

The wildcard query is an expensive query due to the nature of how it was implemented. Few other expensive queries are the range, prefix, fuzzy, regex, and join queries as well as others. Furthermore, using one of these queries occasionally might not impact server performance, but overusing these expensive queries will perhaps destabilize the cluster, leading to bad user experiences.

Elasticsearch allows us to execute these expensive queries on the cluster, thus leaving the discretion up to us. However, if we want to put a stop to the execution of such expensive queries on the cluster, there’s a setting that we can turn off: setting the allow_expensive_queries attribute to false. We should set this on the cluster settings as the following listing shows:

PUT _cluster/settings
{
"transient": {
"search.allow_expensive_queries": "false"
}
}

By switching off the allow_expensive_queries setting, we are protecting the cluster from overload. Do note, however, that if we setallow_expensive_queries to false, the wildcard queries noted as one of the expensive queries will not be executed. You’ll get the following error if you try execute any of these:

“reason” : “[wildcard] queries cannot be executed when ‘search.allow_expensive_queries’ is set to false.”

Wildcards will fetch the results with missing characters in a word or a sentence, however, there might be times when we wish to query words with a prefix — that’s when the prefix queries come into picture, which are discussed in another article — stay tuned!

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action