Exists Queries

Documents are expected to have hundreds of fields — for example, a tweet consists of hundreds of field and so is a Trade in a finanancial organization. Fetching all the fields in a response is a waste of bandwidth, and knowing if the field exists before attempting to fetch it is a better precheck. To do that, the exists query fetches the documents for a given field if the field exists.

For example, if we run the query in the following listing, we get a response with the document because the document with the title field exists.

GET movies/_search
{
"query": {
"exists": {
"field": "title"
}
}
}

If the field doesn’t exist, the results return an empty hits array (hits[]). If you are curious, try out the same query with a nonexistent field like title2, for example, and you’ll see an empty hit.

Non existent field check

There’s another subtle use case of an exists query: when we want to retrieve all documents that don’t have a particular field (a nonexistent field). For example, in listing shown below, we check all the documents that aren’t classified as confidential (assuming classified documents have an additional field called confidential set to true).

PUT top_secret_files/_doc/1 
{
"code":"Flying Bird",
"confidential":true
}PUT top_secret_files/_doc/2
{
"code":"Cold Rock"
}

GET top_secret_files/_search
{
"query": {
"bool": {
"must_not": [{
"exists": {
"field": "confidential"
}
}]
}
}
}

As the listing shows, we add two documents to the top_secret_files index: one of the documents has an additional field called confidential. We then write an exists query in a must_not clause of a bool query to fetch all the documents that are not categorized as confidential. The return results will omit all the confidential documents, which is what the exists query helped us to identify the appropriate documents.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action