Scaling the Cluster

One powerful feature of Elasticsearch is its ability to scale up the server to provide petabytes of data. There is no complexity in setting this up, other than procuring additional nodes.

Scaling the cluster

Elasticsearch clusters can scale up any number of nodes, from a single node to hundreds of them based on the use case, data, and business requirements. Although we may have worked with a single node cluster on our personal machines when learning about Elasticsearch, seldom do we have a single-node cluster in production.

One of the reasons we chose Elasticsearch is for its resilience and fault tolerance capabilities. We don’t want to lose data when a node crashes. Fortunately, Elasticsearch can cope with hardware failures and recover from them as soon as the hardware is back online.

Choosing the cluster size is an important IT/data governance strategy for any organization, and there are a set of multiple variables, factors, and inputs that go into sizing Elasticsearch clusters for our data needs. Although we can add additional resources (memory or new nodes) to an already existing cluster, it is important to forecast such demands.

We can add additional nodes to an existing cluster for increasing read throughput or for indexing performance. At the same time, we can also downsize clusters by removing nodes, perhaps because the demand for indexing or read throughput have declined.

Adding nodes to the cluster

Each of the nodes in a cluster is essentially running an instance of Elasticsearch on a dedicated server. Note that it is advised to deploy and run Elasticsearch on a dedicated server with as much computing power as your requirements demand rather than bundled up with other applications, especially with applications that are resource hungry. Having said that, there’s nothing stopping you from creating multiple nodes on a single server, but this defeats the purpose of data resiliency. If that server should crash, you’ll lose all nodes on that server.

When you launch an Elasticsearch server for the first time, a cluster is formed, though with a single node. This single-node cluster is the typical setup in our development environments for testing and trying out the products. The figure given below shows the single-node cluster.

Figure : Single-node cluster

By bringing up more nodes (provided all nodes are christened with the same cluster name), they all join to make up a multi-node cluster. Let’s see how shards are created and distributed across the cluster as we keep adding nodes, from a single node to, say, three nodes.

Say we want to create an index called chats with one shard and one replica each. For this, we need to define the number of shards and replicas during our index creation process by configuring the settings on the index as this listing shows.

Listing : Creating the chats index

PUT chats
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}

This script creates the chats index with one primary shard on a single node. Elasticsearch wouldn’t create the replica of this index on the same node where the primary shard exits. Indeed, there’s no point in creating a backup drive on the same location as the main drive. The figure given below shows this pictorially (a shard is created, a replica isn’t).

Figure : Single-node cluster with no replica created

If the replicas aren’t created, the cluster isn’t classified as being in a healthy state. We can use the cluster health API to get a high-level view of the cluster state. The GET _cluster/health call fetches the health of the cluster and outputs it in a JSON format that details the unassigned shards, the state of the cluster, the number of nodes and data nodes, and so on. But, what do we mean by cluster health?

Cluster health

Elasticsearch uses a simple traffic light system to let us know the state of the cluster: green, red, and yellow. When we first create an index on a single-node server, its health will be yellow because the replicas shards are not yet assigned. Unless we purposely set replicas to zero on all indices on this node, which is possible but is an anti-pattern.

Figure : Representing the health of a cluster as traffic lights

Understanding this, we can ask Elasticsearch to explain why the cluster is unhealthy (or why the shards are unassigned). We can query the server using the cluster allocation API for an explanation of why a shard is in its current state. The following query, for example, asks for an explanation about the chats index.

GET _cluster/allocation/explain
{
"index": "chats",
"shard": 0,
"primary": false
}

This query returns a detailed explanation of the state of this index. We already know that the replica won’t be created or assigned in a single node server, right? To verify that, let’s ask the cluster if that’s the case. The following snippet shows the condensed response from the server for the query in the previous listing :

{
"index" : "chats",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"allocate_explanation" : "cannot allocate because
allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [{
...
"deciders" : [{
"decider" : "same_shard",
"explanation" : "a copy of this shard is already allocated to this node ..]"}
]
...
}

The current_state of the chats index mentioned in the returned response says it is unassigned. It includes an explanation that the server couldn’t allocate a copy of the shard because of the same_shard decider. (Check the value of the deciders array in the previous snippet.)

The node in the cluster has to bear different roles by default; for example, master, ingest, data, ml, transform, and others. We can, however, specify the node’s roles by setting the node.roles property with the appropriate roles in the elasticsearch.yml file.

We can index the data into our chats index as well as perform search queries on this single node instance. Because replicas are absent, the risk of losing data as well as creating performance bottlenecks arise. To mitigate such risk, we can add an additional node(s) to expand the cluster.

Adding an additional node is as simple as booting up Elasticsearch on a different machine but in the same network with the same cluster.name (a property in the elasticsearch.yml file), provided that security is disabled .

Note: Since version 8.0, by default, Elasticsearch installations enable security, where xpack.security.enabled is true. When you bring up the Elasticsearch server for the first time, it generates the required keys and tokens, and instructs you of the steps to be followed to get a successful Kibana connectivity. If you are experimenting with Elasticsearch on your local machine, you can disable security, but I strongly recommend not to jump into production with an unsecured setup by setting the property xpack.security.enabled to false in the elasticsearch.yml file. An unsecured setup is very dangerous, and you are asking for trouble.

Bringin up a second node helps Elasticsearch in creating the replica shard on this second node. The figure below illustrates this process.

Figure : The replica shard gets created on a second node.

As the figure shows, Elasticsearch instantly creates the replica 1, which is an exact copy of the shard 1, when the second node starts and joins the cluster. The contents of shard 1 are synced up to replica 1 immediately, and once they are in sync, any write operations on shard 1 are replicated to replica 1 going forward. The same applies for multiple shards and multiple replicas too.

Should you add more nodes to the cluster, Elasticsearch scales the cluster elegantly with additional nodes. It automatically distributes the shards and replicas with the addition (or removal) of the nodes. Elasticsearch manages everything transparently behind the scenes such that we, ordinary users or administrators, wouldn’t have to worry about the mechanics of communication between the nodes or the way the shards and their data is distributed, and so forth. The figure below illustrates how a shard (SHARD 2) gets moved to a newly joined second node (thus making a multi-node cluster), as well as replicas getting created.

Figure : Newly joined node gets new shards moved over from the single-node cluster.

Should Node A crash, Replica 1 on Node B is instantly promoted to Shard 1 and, thus, goes back to the single-node cluster until the second node is back online again. We can keep adding new nodes to the cluster when necessary by simply bringing up the Elasticsearch server on the new node.

Increasing the read throughput

Increasing the number of replicas has an added performance benefit. They help increase the read throughput: the reads (queries) can be served by the replicas while the shards do the indexing operations. If your application is read intensive (more queries are issued on the data than data that gets indexed, typically an e-commerce application), increasing the number of replicas alleviates the load on the application.

As each replica is an exact copy of the shard, we can split the application’s shards into two categories: primary shards deal with the data and replica shards deal with the read side of the data. When a query comes from the client, the coordinator node diverts that request onto the read nodes for the answer. Thus, adding more replicas to the index helps improve read throughputs. Do note the memory implications, however. Because each replica is an exact copy of the shard, you need to size the cluster accordingly.

To increase the replicas, and thus read throughput, to counter the read query performance bottlenecks, one strategy is to update the number_of_replicassetting on the live index. This is a dynamic setting on the index that we can tweak, even when the index is live and in production. For example, let’s say we add an additional 10 nodes to our 5-node cluster with 5 shards to counteract the read query performance issues that are bringing the server to its knees. As the following listing shows, we can then increase the replica count to increase the replica setting on the live index.

PUT chats/_settings
{
"number_of_replicas": 10
}

These additional replicas are per shard and are created on the newly formed nodes with data copied over to them. Now that we have additional replicas in our armory, any read queries are handled efficiently by these replicas. It is Elasticsearch’s responsibility to route these client requests to the replicas, thus improving the search and query performance of the application.

Note: Just to recap about primary shard resizing: once an index is created and is live, you cannot resize it because it is a static property of the index. If you must change this setting, you need to close the index, create a new index with the new size, and re-index your data from the old index to your new index.

Although increasing read replicas increases the read throughput, it puts a strain on the cluster’s memory and disk space. That’s because each replica consumes as much resources as its counterpart (primary shard).

Stay tuned for the next set of articles on this subject.

You don’t need follow me 🙂 to read these articles.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action