Synonyms are different words with the same meanings. For example, football and soccer (the latter being the way football was called in America), both should point to a football game. The synonyms filter helps create a set of words to help produce a richer user experience while searching.
Elasticsearch expects us to provide a set of words and their synonyms by configuring the analyzer with a synonym token filter. We create the synonyms filter on an index’s settings as the below listing demonstrates:
PUT index_with_synonyms
{
"settings": {
"analysis": {
"filter": {
"synonyms_filter":{
"type":"synonym",
"synonyms":[ "soccer => football"]
}
}
}
}
}
In the code example, we created a synonyms list (soccer is treated as an alternate name to football) associated with the synonym type. Once we have the index configured with this filter, we can test the text field:
POST index_with_synonyms/_analyze
{
"text": "What's soccer?",
"tokenizer": "standard",
"filter": ["synonyms_filter"]
}
This produces two tokens: “What’s”, and “football”. As you can see from the output, the word “soccer” is replaced with the word “football”.
Synonyms from a file
We can provide the synonyms via a file on a filesystem rather than hard coding them as we did in the above listing. To do that, we need to give the file path in the synonyms_path
variable as the following listing demonstrates:
PUT index_with_synonyms_from_file_analyzer
{
"settings": {
"analysis": {
"analyzer": {
"synonyms_analyzer":{
"type":"standard",
"filter":["synonyms_from_file_filter"]
}
}
,"filter": {
"synonyms_from_file_filter":{
"type":"synonym",
"synonyms_path":"synonyms.txt" #A Relative path of the synonyms file
}
}
}
}
}
Make sure a file called “synonyms.txt” is created under $ELASTICSEARCH_HOME/config
with the following contents:
The synonyms.txt file with a set of synonyms:
# file: synonyms.txt
important=>imperative
beautiful=>gorgeous
We can call the file using a relative or absolute path. The relative path points to the config directory of Elasticsearch’s installation folder. We can test the above analyzer by invoking the _analyze
API with the following input, as shown in the listing below:
POST index_with_synonyms_from_file_analyzer/_analyze
{
"text": "important",
"tokenizer": "standard",
"filter": ["synonyms_from_file_filter"]
}
We should certainly get the “imperative” token as the response, proving that the synonyms were being picked up from the synonyms.txt
file we dropped in the config
folder.
You can add more values to this file while Elasticsearch is running and try it out too. Edit the synonyms.txt
file (let the Elasticseach keep alive and running) and add values like those shown below:
abundant=>ample
comprehend=>grasp
You can now check these hot additions by re-testing the synonym script, as shown below:
POST index_with_synonyms_from_file_analyzer/_analyze
{
"text": "abundant",
"tokenizer": "standard",
"filter": ["synonyms_from_file_filter"]
}
// Executing this script should output "ample"
In the real world, we can surely have this synonyms.txt
file mounted on a network drive and get updated by another service should we want that to get updated dynamically.
That’s it, it’s a wrap! We looked at the synonym filters and their usage in this article.