Understanding Master: Master and Elections (1/2)

We discuss and understand master role in Elasticsearch cluster in this two-part article series

  1. Understanding Master: Master Node and Elections
  2. Understanding Master: Quorum and Split Brain

Every node in a cluster can have multiple roles assigned to them: master, data, ingest, ml (machine learning), and others. One of the roles that we are interested in this current discussion is the master role. The assignment of a master role is an indication that this node is a master-eligible node. Before we discuss master-eligibility, let’s understand the importance of a master node.

Master node

The master node is responsible for cluster-wide operations such as allocating shards to nodes, indices management, and other light-weight operations. The master node is a critical component, responsible for keeping the cluster healthy. It strives hard to keep the state of the cluster and node community intact. There’s only one master node that exists for a cluster, whose sole job is to worry about the cluster operations — nothing more, nothing less.

Master-eligible nodes are a set of nodes that are tagged as master roles. Assigning a master role to a node doesn’t mean that the node becomes the cluster master, but it is one step closer to becoming one should the elected master crash. Remember, the other master-eligible nodes are also in line to become a master, given an opportunity, so they too are one step closer to becoming the master.

What’s the use of a master-eligible node, you might ask! Every master-eligible node exercises its vote to select a cluster’s master. Behind the scenes, when we boot up nodes for the first time to form a cluster or when the master dies, one of the first steps taken is to elect a master. Let’s learn all about elections for the master cluster in the following section.

Master elections

The master cluster is chosen democratically by elections! An election is held to choose a master when the cluster is being formed for the first time or when the current master dies. As part of the election, all master-eligible nodes pick one of them as the master. Then, if the master crashes for whatever the reason, any of the master-eligible nodes call for an election. Members cast their vote to elect a new master. Once elected, the master node takes over the duties of cluster management.

Not all days are happy days; — there are circumstances beyond their control that can knock off a master node. The master-eligible nodes, therefore, communicate constantly with the master node to make sure that it is alive, as well as notifying the master of their own status. When the master node is gone, the imminent job of the nodes is to call for elections to elect a new master.

There are a few properties, such as cluster.election.duration and cluster.election.initial_timeout, to help us configure the frequency of elections and how long to wait before master-eligible nodes call for an election. The initial_timeout attribute, for example, is the amount of time a master-eligible node will wait before calling for an election. By default, this value is set to be 500 ms. For example, let’s say that master-eligible node A does not receive a heartbeat from the master node in 500 ms. It then calls for an election because it thinks the master is gone.

In addition to electing a master, the master-eligible nodes have to work in tandem to get cluster operations rolling. However, although the master is the “king” of the cluster, it needs to get the support and buy-in from the other master-eligible nodes. The master’s role is to maintain and manage cluster state, so let’s look at this in the next section.

Cluster state

Cluster state consists of all the metadata about shards, replica, schemas, mappings, field information, and more. These details are stored as the global state in the cluster and are also written to each node. The master is the only node that can commit the cluster state. It has a lot more of the responsibility to maintain the cluster with up-to-date information. Master nodes commit the cluster data in phases (similar to the two-phase commit-transaction in a distributed architecture):

  • The master computes the cluster changes and publishes them to the individual nodes and then waits for the acknowledgements.
  • Each node receives the cluster updates, but they aren’t applied to their local state yet. On receipt, they send the acknowledgement back to the master.
  • When receiving a quorum of acknowledgements from the master-eligible nodes (the master doesn’t need to wait for acknowledgements from each node, just the master-eligible nodes), the master commits the changes to update the cluster state.
  • The master node, after successfully committing the cluster changes, broadcasts a final message to individual nodes instructing them to commit the previously received cluster changes.
  • The individual nodes commit the cluster updates.

There’s a time limit set by the cluster.publish.timeout attribute (30 s by default), to commit each batch of cluster updates successfully. This period starts from the time of posting the publication of the first cluster update messages to the nodes until the cluster state is committed. If the global cluster updates are committed successfully within the default 30 s, the master waits until this time elapses before starting with the next batch of cluster updates. However, the story doesn’t end here!

If the cluster updates are not committed in the given 30 s, it may be that the master has died. In this instance, the current master rejects the cluster updates and resigns from the post of master. The election for the new master begins as a result.

Although the global cluster updates are committed, the master still waits for those nodes who haven’t sent back the acknowledgements. Unless it receives acknowledgements, the master can’t mark this cluster update a success. In such cases, the master keeps track of these nodes and awaits a grace period of 90 s, set by the cluster.follower_lag.timeout attribute, which defaults to 90 s. Should the node(s) not respond in this 90 s grace period, they are marked as failed nodes, and as a result, the master removes them from the cluster.

As you may have gathered by now, there’s a lot that goes on under the hood in Elasticsearch. The cluster updates happen frequently, and the master has a ton of responsibility maintaining the moving parts. In the previous cluster updates scenario, the master awaits for the acknowledgements from a group of master-eligible nodes called a quorum before proceeding to commit the state rather than waiting for the rest of the nodes. A quorum is the minimum number of master nodes required for a master to operate effectively, which is discussed in detail in the next article.

  1. Understanding Master: Master Node and Elections
  2. Understanding Master: Quorum and Split Brain

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action