Synonym Filters

Synonyms are different words with the same meanings. For example, football and soccer (the latter being the way football was called in America), both should point to a football game. The synonyms filter helps create a set of words to help produce a richer user experience while searching.

Elasticsearch expects us to provide a set of words and their synonyms by configuring the analyzer with a synonym token filter. We create the synonyms filter on an index’s settings as the below listing demonstrates:

PUT index_with_synonyms
{
"settings": {
"analysis": {
"filter": {
"synonyms_filter":{
"type":"synonym",
"synonyms":[ "soccer => football"]
}
}
}
}
}

In the code example, we created a synonyms list (soccer is treated as an alternate name to football) associated with the synonym type. Once we have the index configured with this filter, we can test the text field:

POST index_with_synonyms/_analyze
{
"text": "What's soccer?",
"tokenizer": "standard",
"filter": ["synonyms_filter"]
}

This produces two tokens: “What’s”, and “football”. As you can see from the output, the word “soccer” is replaced with the word “football”.

Synonyms from a file

We can provide the synonyms via a file on a filesystem rather than hard coding them as we did in the above listing. To do that, we need to give the file path in the synonyms_path variable as the following listing demonstrates:

PUT index_with_synonyms_from_file_analyzer
{
"settings": {
"analysis": {
"analyzer": {
"synonyms_analyzer":{
"type":"standard",
"filter":["synonyms_from_file_filter"]
}
}
,"filter": {
"synonyms_from_file_filter":{
"type":"synonym",
"synonyms_path":"synonyms.txt" #A Relative path of the synonyms file
}
}
}
}
}

Make sure a file called “synonyms.txt” is created under $ELASTICSEARCH_HOME/config with the following contents:

The synonyms.txt file with a set of synonyms:

# file: synonyms.txt
important=>imperative
beautiful=>gorgeous

We can call the file using a relative or absolute path. The relative path points to the config directory of Elasticsearch’s installation folder. We can test the above analyzer by invoking the _analyze API with the following input, as shown in the listing below:

POST index_with_synonyms_from_file_analyzer/_analyze
{
"text": "important",
"tokenizer": "standard",
"filter": ["synonyms_from_file_filter"]
}

We should certainly get the “imperative” token as the response, proving that the synonyms were being picked up from the synonyms.txt file we dropped in the config folder.

You can add more values to this file while Elasticsearch is running and try it out too. Edit the synonyms.txt file (let the Elasticseach keep alive and running) and add values like those shown below:

abundant=>ample
comprehend=>grasp

You can now check these hot additions by re-testing the synonym script, as shown below:

POST index_with_synonyms_from_file_analyzer/_analyze
{
"text": "abundant",
"tokenizer": "standard",
"filter": ["synonyms_from_file_filter"]
}

// Executing this script should output "ample"

In the real world, we can surely have this synonyms.txt file mounted on a network drive and get updated by another service should we want that to get updated dynamically.

That’s it, it’s a wrap! We looked at the synonym filters and their usage in this article.

Me @ Medium || LinkedIn || Twitter || GitHub