Configuration Options

Elasticsearch comes with tons of settings and configurations that may baffle even expert engineers. Although it uses the convention over configuration paradigm and works with defaults most of the time, it is imperative to customize the configuration before taking the application to production.

In this section, we will walk through some properties that fall under various categories and discuss their importance and how to tweak them. There are three configuration files that we can tweak:

  • elasticsearch.yml — This configuration file is the most commonly edited, where we can set cluster name, node information, data and log paths, as well as network and security settings.
  • log4j2.properties — Let’s us set the logging levels of the Elasticsearch node.
  • jvm.options — Here we can set the heap memory of the running node.

These files are read by the Elasticsearch node from the config directory, which is basically a folder that’s under the Elasticsearch’s installation directory. This directory defaults to $ES_HOME/config for the binary (zip or tar.gz) installations (the ES_HOME variable points to Elasticsearch’s installation directory). This differs if you are installing with a package manager like Debian or RPM distributions, where it defaults to /etc/elasticsearch/config.

If you expect to access your configuration files from a different directory, you can set and export a path variable called ES_PATH_CONF, which points to the new configuration file location. In the next few subsections, we’ll go over some settings that are important to understand, not only for administrators but for developers too.

Main configuration file

Although folks at Elastic developed Elasticsearch to run with defaults (convention over configuration), it is highly unlikely we would rely on defaults when taking the node into production. We should tweak the properties to set specific network information, data or log paths, security aspects, and so on. To do so, we can amend the elasticserch.yml file to set most of the required properties for our running applications.

Elasticsearch exposes network properties as network.* attributes. We can set the host names and port numbers using this property. For example, we can change the port number of Elasticsearch to 9900 rather than holding on to the default port 9200: http.port :9900. You can set transport.port too if you want to change the port on which nodes communicate internally.

There may be a lot of properties that you may need to alter, depending on your requirements. Refer to the official documentation if you want to find more about these properties in detail:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#common-network-settings

The logging options

Elasticsearch is developed in Java, and like most Java applications, Elasticsearch uses Log4j 2 as its logging library. A running node spits out logging information at the INFO level, both to the console as well as to a file (using Kibana Console and RollingFile Appenders, respectively).

The Log4j properties file (log4j2.properties) consists of some system variables (sys:es.logs.base_path, sys:es.logs.cluster_name, etc.), which are resolved during application run time. Because these properties are exposed by Elasticsearch, they are available to Log4j, which lets Log4j set up its log file directory location, log file pattern, and other properties. For example, sys:es.logs.base_path points to the path where Elasticsearch writes the logs, which resolves to the $ES_HOME/logs directory.

By default, most of Elasticsearch runs at the INFO level, but we can customize the setting based on individual packages. For example, we can edit the log4j2.properties file and add a logger for the indexpackage as the following listing shows.

Listing : Setting the logging level for a specific package

logger.index.name =  org.elasticsearch.index
logger.index.level = DEBUG

By doing this, we allow the indexpackage to output the logs at the DEBUG level. Rather than editing this file on a specific node and restarting that node (you may need to do this for every node if you’ve not managed to do so before creating a farmed cluster), we can set the DEBUG log level at the cluster level for this package. The next listing demonstrates this setup:

Listing : Setting the transient log level globally

PUT _cluster/settings
{
"transient": { #A
"logger.org.elasticsearch.index":"DEBUG" #B
}
}

As the query shows, we set the logger level property for the indexpackage to DEBUG in the transientblock. The transientblock indicates that the property is not durable (only available while the cluster is up and running). If we restart the cluster or it crashes, the setting is lost because it is not stored permanently on the disk.

We can set this property with a call to the cluster settings API (_cluster/settings) as the code in the listing shows. Once this property is set, any further logging info related to the index in the org.elasticsearch.index source package is output at the DEBUG level.

Elasticsearch provides a means of storing the cluster properties durably too. If we need to store the properties permanently, we can use the persistent block. The following listing replaces the transient block with a persistent block.

Listing : Setting the log level permanently

PUT _cluster/settings
{
"persistent": {
"logger.org.elasticsearch.index":"DEBUG",
"logger.org.elasticsearch.http":"TRACE"
}
}

This code sets the DEBUG level on the org.elasticsearch.index package and the TRACE level on the org.elasticsearch.http package. Because both are persistent properties, the logger writes detailed logs at these levels as set on the packages, which survive cluster restarts too.

Be mindful of such properties set on a permanent basis using the persistentproperty. My suggestion is to enable the logging level DEBUG or TRACE when troubleshooting or during a debugging episode. When you complete the “firefighting” episode in production, reset it back to INFO to avoid writing reams of requests to disk.

The JVM options

Because Elasticsearch uses the Java programming language, there can be a multitude of optimization tweaks that can be done on the JVM level. For obvious reasons, discussing such a huge topic in this article is not going to do it justice. However, if you are curious and want to understand the JVM nature or want to fine-tune the performance at a lower level, refer to books like Optimizing Java (Ben Evan and Jame Gough) or Java Performance (Scott Oaks). I highly recommend them as they not only provide the fundamentals but also the operational tips and tricks.

Elasticsearch provides a jvm.options file present in the /config directory that has the JVM settings. However, this file is only for reference purposes (to check the memory settings of the node, for example) and never to be edited. Heap memory is automatically set for the Elasticsearch server, based on the available memory of the node.

Warning: We should never edit the jvm.options file under any circumstances. Doing so may corrupt Elasticsearch’s internal workings.

If we want to upgrade memory or change any of the JVM settings, we must create a new file with .options as the filename extension, provide the appropriate tuning parameters, and place the file in a directory called jvm.options.d under the config folder for archive installations (tar or zip). We can give any name to our custom file, but we need to include the fixed .options extension in its filename.

For RPM/Debian package installations, this file should be present under the /etc/elasticsearch/jvm.options.d/ directory. Similarly, mount the options file under the /usr/share/elasticsearch/config folder for Docker installations.

We can edit the settings in this custom JVM options file. For example, to upgrade heap memory in a file called jvm_custom.options, we can use the code in the following listing.

Listing : Upgrading heap memory

# Setting the JVM heap memory in jvm_custom.options file
-Xms4g
-Xmx8g

The _Xms flag sets the initial heap memory, whereas the _Xmx adjusts the maximum heap memory. The unwritten rule is not to let the_Xms and _Xmxsettings exceed more than 50% of the node’s total RAM. Apache Lucene running under the hood uses the second half of the memory for its segmentation, caching, and other processes.

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action