Understanding Index Life Cycle (ILM) Management

Overview

Indices are expected to grow in size as data pours in over time. Sometimes an index is written too frequently that the underlying shards run out of memory, and other times most of the shards may be sparsely filled. Wouldn’t it be ideal to automatically rollover the index in the former case while shrinking the index in the latter?

There is also time-series data that we need to consider. Take an example of logs that are written to a file on a daily basis. These logs are then exported to the indices suffixed with a time period like my-app-2021–10–24.log. When a day is rolled off to the next day, you’d expect the respective index to be rolled over too; for example, my-app-2021–10–24.log to my-app-2021–10–25.log (the date is incremented by day) as the figure 1 below shows:

Figure 1: Rolling over to a new index when as the new day dawns

Of course, we can write a scheduled job that can do this for us, but fortunately, Elastic released a new feature relatively recently called index life-cycle management (ILM).

Introducing Index Lifecycle Management

As the name suggests, the ILM is all about managing the indices based on a life-cycle policy. The policy is a definition that declares some rules that are executed by the engine when the conditions of the rules are met. For example, we can define rules based on rolling over the current index to a new index when:

  • The index reaches a certain size (say 40 GB, for example)
  • The number of documents in the index crossed, say, 10,000
  • The day is rolled over

The Index Lifecycle Management (ilm) policy is a rule definition book executed by the engine on the indices when the conditions of the rules are met.

Before we start scripting the policy, let’s understand the life cycle of an index: the various phases of an index can progress based on some criteria and conditions.

The index life cycle phases

An index has five life-cycle phases — hot, warm, cold, frozen, and delete — as demonstrated in figure 2 below:

Figure 2: Life cycle of an index

Let’s look at the brief description of each of these phases:

  • Hot — The index is in full operation mode. It is available for both read and write, thus enabling the index for both indexing and querying.
  • Warm — The index is in read-only mode. Indexing is switched off but open for querying so that the search and aggregation queries can still be served by this index.
  • Cold — The index is in read-only mode. Similar to the warm phase, where indexing is switched off, and it’s open for querying, albeit the queries are expected to be infrequent. When the index is in this phase, the search queries might result in slow response times.
  • Frozen — This is similar to the cold phase, where the index is switched off for indexing but querying is allowed. However, the queries are more infrequent or even rare. When the index is in this phase, users may seem to notice longer response times for their queries.
  • Delete — This is the index’s final stage, where the index gets deleted permanently. As such, the data is erased and resources are freed. Usually, it’s expected that we take a snapshot of the index before deleting so that the data from the snapshot can be restored at some point in the future.

Transitioning from the hot phase into every other phase is optional. That is, once created in the hot phase, the index can remain in that phase or transition to any of the other four phases. In the next sections, we will check out a few examples that set an indexing life-cycle policy so that the indices can be managed automatically by the system.

Managing life cycle manually

So far, we’ve managed to create or delete an index on demand when needed i.e., intervening manually. But we were unable to delete, rollover, or shrink indices based on certain conditions such as size of the index exceeding a certain threshold, after a certain number of days, and so forth. We’ll use ILM to help us set up this feature.

Elasticsearch exposes an API for working with the index life-cycle policy with an endpoint is _ilm. And the format goes like this: _ilm/policy/<index_policy_name>. The process is split into two steps

  • Defining a life cycle policy and
  • Associating that policy with an index for execution

This is detailed in the following sections.

Step 1: Defining a life-cycle policy

The first step is to define a life-cycle policy, where we provide the required phases and set the relevant actions on those phases. The code in the following listing defines such a policy.

Listing 1: Creating a policy with hot and delete policies

PUT _ilm/policy/hot_delete_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "1d",
"actions": {
"set_priority": {
"priority": 250
}
}
},
"delete": {
"actions": {
"delete": {}
}
}
}
}
}

The hot_delete_policy in the code listing 1 defines a policy with two phases: hot and delete. Here’s what the definition states:

  • Hot phase — The index is expected to live for at least one day before carrying out the actions. The actions block defined in the “actions” object sets a priority (250 in this example). The indices with a higher priority are acted on first during node recovery.
  • Delete phase — The index is deleted once the hot phase completes all the actions. As there is no min_age on the delete phase, the delete action kicks in immediately once the hot phase finishes.

The first step of declaring and defining the policy is complete. Now, how do we get this policy attached to an index of our choice? That’s exactly what we’ll discuss in the next section.

Step 2: Associating the policy with an index

Now that we have the policy defined, the next step is to get an index associated with this policy. To see this in action, let’s create an index and attach the policy from listing we defined above to the index. The following listing 2 creates the index definition.

Listing 2: Creating an index with an associated index life cycle

PUT hot_delete_policy_index
{
"settings": {
"index.lifecycle.name": "hot_delete_policy"
}
}

This script creates the hot_delete_policy_index index with a property setting on the index: index.lifecycle.name: the index is now associated with a life-cycle policy as the index.lifecycle.name points to our previously created policy (hot_delete_policy) in listing 1. This means, the index would undergo phase transition as per the policy definition. That is, once the index is created, it first enters into the hot phase and stays in that hot phase for a day (min_age=1d) before it applies a few actions (in this case, setting a priority on the index).

As soon as the hot phase completes (1 day, to be precise as per the policy definition), the index transitions into the next stage, the delete phase in this case. This is a straight forward phase where the index gets deleted automatically.

Note

The hot_delete_policy policy defined in listing 1 deletes the index after 1 day as per the definition. Be aware that if you are using this policy in production, you may find no index available after the deletion phase (the delete phase purges everything).

To sum up, attaching an index life-cycle policy to an index transitions the index into given phases and executes certain actions defined in each of those respective phases. We can surely define an elaborate policy such as a hot phase for 45 days, then move to a warm phase, which will be alive for one month. We then could move it to the cold phase from the warm phase and keep it on cold for one year, and finally, delete the index after a year.

So far, we learned how we can define a policy with various phases and attach it to an index. We executed rollover scripts so the indices rolled over as needed. However, how can we transition the indices based on some conditions such as every month or a particular size? Fortunately, Elasticsearch provides just that — automated and conditional index life-cycle rollovers, and that’s the topic of the next section.

Life cycle with rollover

In this section, we will set conditions on a timeseries index to magically roll when those conditions are met. Let’s say we want to the index to be rolled over based on the following conditions:

  • On each new day.
  • When the maximum number of documents hits 10,000.
  • When the maximum index size reaches 10 GB.

We need to define a life-cycle policy that encompasses these conditions. The script in listing 3 below defines a simple policy declaring a hot phase where the shards are expected to roll over based on these conditions with a few actions.

Listing 3: Simple policy definition for hot phase

PUT _ilm/policy/hot_simple_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "1d",
"max_docs": 10000,
"max_size": "10gb"
}
}
}
}
}
}

In our policy, we declared one phase, the hot phase, with rollover as the action to be performed when any of the conditions declared in the rollover actions are met. For example, if the maximum number of documents is 10,000 or the size of the index exceeds 10 GB or if the index is one day old, the index rolls over. As we declared the minimum age (min_age) to be 0ms, as soon as the index is created, it gets moved into the hot phase instantly and then rolled over.

The next step is to create an indexing template, attaching the life-cycle policy to it. The script in the listing 4 declares an index template with an index pattern mysql-*.

Listing 4: Attaching a life-cycle policy to a template

PUT _index_template/mysql_logs_template
{
"index_patterns": [
"mysql-*"
],
"template": {
"settings": {
"index.lifecycle.name": "hot_simple_policy",
"index.lifecycle.rollover_alias": "mysql-logs-alias"
}
}
}

There’s a couple of things we may need to note from the index template script. We must associate our previously defined index policy by setting it as index.lifecycle.name with this index template. Also, as the policy’s definition has a hot phase with rollover defined, we must provide the index.lifecycle.rollover_alias name when creating this index template.

The final step is to create an index matching the index pattern defined in the index template, with a number as a suffix so the rollover indices are generated correctly. Another thing to note is that we must define the alias and declare that the current index writeable by setting is_write_index to true as the listing 5 shows.

Listing 5: Setting the index as writable for the alias

PUT mysql-index-000001
{
"aliases": {
"mysql-logs-alias": {
"is_write_index": true
}
}
}

Once we create the index, the index policy kicks in. In our example, the index enters into the hot phase as the min_age is set to 0 milliseconds and then moves into the phase’s rollover action. The index stays in this phase until one of the conditions (age or size of the index or number of docs) is met. As soon as the condition is positive, the rollover phase gets executed and a new index mysql-index-000002 is created (note the index suffix). The alias is remapped to point to this new index automatically. Then mysql-index-000002 is rolled over to the mysql-index-000003 index (again, if one of the conditions is met) and the cycle continues.

Policy scan interval

By default, policies are scanned every 10 minutes. You need to update the cluster settings using the _cluster endpoint if you want to alter this scan period.

Usually, when we are trying out the life-cycle policies in development, a common issue we see is that none of the phases are executed. For example, although you may have set the phases’ times (min_agemax_age) in milliseconds, you may notice none of your phases are executing. Not realizing the scan’s interval, we may think the life-cycle policies are not being invoked, but actually this is due to the fact that the policies are waiting to be scanned.

We can reset this scan period by invoking the _cluster/settings endpoint with the appropriate time period. For example, the following snippet resets the poll interval to 10 milliseconds:

PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "10ms"
}
}

Now that we understand the index rollover using life-cycle policies, let’s script another policy with multiple phases. The listing 6 shows how to create this.

Listing 6: Creating an advanced life-cycle policy

PUT _ilm/policy/hot_warm_delete_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "1d",
"actions": {
"rollover": {
"max_size": "40gb",
"max_age": "6d"
},
"set_priority": {
"priority": 50
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}

This policy consists of hot, warm, and delete phases. Let’s look at what happens and what actions are executed in these phases:

  • Hot phase — The index enters into this phase after one day because the min_age attribute was set to 1d. After one day, the index moves into the roll-over stage and waits for the conditions to be satisfied: the maximum size is 40 GB ("max_size": "40gb") or the age is older than 6 days ("max_age": "6d"). Once any of these conditions are met, the index transitions from the hot phase to the warm phase.
  • Warm phase — When the index enters the warm phase, it stays in the warm phase for about one week ("min_age": "7d") before any of its actions are implemented. After the seventh day, the index gets shrunk to one node ("number_of_shards": 1), then the phase is deleted.
  • Delete phase — The index stays in this phase for 30 days ("min_age": "30d"). Once this time lapses, the index is deleted permanently. Be wary of this stage as the delete operation is irreversible! My advice is to take a backup of the data before you delete the data permanently.

That’s a wrap.

Hope you liked this short article on index life cycle management!

Do let me know if you need any specific topics that may need writing up 🙂 And don’t forget to follow me here on Medium and give a clap 🙂

Please connect with Me on Twitter | Linkedin | Medium

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.