Documents are expected to have hundreds of fields — for example, a tweet consists of hundreds of field and so is a Trade in a finanancial organization. Fetching all the fields in a response is a waste of bandwidth, and knowing if the field exists before attempting to fetch it is a better precheck. To do that, the exists
query fetches the documents for a given field if the field exists.
For example, if we run the query in the following listing, we get a response with the document because the document with the title
field exists.
GET movies/_search
{
"query": {
"exists": {
"field": "title"
}
}
}
If the field doesn’t exist, the results return an empty hits
array (hits[]
). If you are curious, try out the same query with a nonexistent field like title2
, for example, and you’ll see an empty hit.
Non existent field check
There’s another subtle use case of an exists
query: when we want to retrieve all documents that don’t have a particular field (a nonexistent field). For example, in listing shown below, we check all the documents that aren’t classified as confidential (assuming classified documents have an additional field called confidential
set to true).
PUT top_secret_files/_doc/1
{
"code":"Flying Bird",
"confidential":true
}PUT top_secret_files/_doc/2
{
"code":"Cold Rock"
}
GET top_secret_files/_search
{
"query": {
"bool": {
"must_not": [{
"exists": {
"field": "confidential"
}
}]
}
}
}
As the listing shows, we add two documents to the top_secret_files
index: one of the documents has an additional field called confidential
. We then write an exists
query in a must_not
clause of a bool
query to fetch all the documents that are not categorized as confidential. The return results will omit all the confidential documents, which is what the exists
query helped us to identify the appropriate documents.
These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.