Automating Snapshots (3/3)

Mini-series of Snapshotting Feature

  1. Introducing and Registering the Snapshots
  2. Creating and Restoring Snapshots
  3. Automating snapshots

Automating snapshots

We’ve just seen the mechanism for creating snapshots, but these are ad hoc snapshots: you create them on demand (for example, when you are migrating data, rolling out a production hotfix release, etc.). However, we can automate this process to regularly backup and take snapshots using the snapshot lifecycle management (SLM) feature. Elasticsearch provides the _slmAPI to manage the lifecycle of snapshots and helps to create lifecycle policies that get executed on a predefined schedule.

To do this, we need to create a policy using the _slmAPI. This policy contains information such as the indices that it should backup, the schedule (cron type of job), retention period, and so on. We must have registered a repository because it is a prerequisite for snapshot lifecycle management.

Let’s say we want to backup all our movie indices at midnight every night to a given repository. We also need to keep these snapshots for a week. We can write a policy and create the automation using the _slmAPI as this listing demonstrates.

Listing : Creating a policy for scheduled snapshots

PUT _slm/policy/prod_cluster_daily_backups #A
{
"name":"<prod_daily_backups-{now/d}>",#B
"schedule": "0 0 0 * * ?", #C
"repository": "es_cluster_snapshot_repository", #D
"config": {
"indices":["*movies*", "*reviews*"], #E
"include_global_state": false #F
},
"retention":{
"Expire_after":"7d" #G
}
}

The _slmAPI creates a policy that gets stored in the cluster for execution when the schedule kicks in. It has three parts that we must provide: a unique name, the schedule, and the repository we registered earlier to store the snapshots. Let’s understand these bits in detail.

The unique name (<prod_daily_backups-{now/d}> in the above listing) considers a name constructed with data math in it. The <prod_daily_backups-{now/d}> is parsed to prod_daily_backups-5.10.2022 if it is run on October 5, 2022 because {now/d} indicates the current date. Every time the schedule kicks in, the new unique name is generated with the current date; for example, prod_daily_backups-6.10.2022prod_daily_backups-7.10.2022, and so on. Because we are using date math in the name, we must enclose the name using angle brackets (< and >) so the parser passes without an issue. For more details, consult the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-date-math-index-names) for further details on date math in names.

As the listing demonstrates, we provided a schedule in the form of a cron job: “schedule”: 0 0 0 * * ?. This cron expression states that the job is to be executed precisely at midnight. We can therefore expect our snapshot process to begin exactly at midnight every night.

The config block in the listing consists of the indices and the cluster state that we want to backup (in this example, all movies and review-related indices). If you do not include the config block, all indices and data streams are considered for snapshot backups automatically by default. The include_global_state attribute indicates that we want to include the cluster state in the snapshot. In the listing, we ignore the cluster state (include_global_state is set to false) as part of the snapshot.

The final piece is the retention information (“retention”:), which holds the information about how long we want to keep the snapshots in the repository. We set the current snapshot’s lifetime as one week using the expire_after attribute set to 7d.

When you execute this query, the automatic snapshot facility remains in place until we delete this policy. It gets executed as per the schedule as expected. This is the easier and preferred way of getting the whole cluster backed up without any manual intervention.

SLM using Kibana

We can create the snapshot lifecycle management policy using Kibana too. Let me show you briefly how we can do this.

Head over to Kibana and navigate to the Management > Snapshot and Restore feature link. Click the Policies tab and invoke the creation of the new policy by clicking the Create Policy button. Fill in the details of this page as in the figure below.

Figure : Creating a SLM policy using Kibana’s console

After you complete this initial settings page, navigate to the next page by clicking the Next button. Here, you’ll fill in the details related to the configblock of the query we looked at in the previous listing. This includes any specific indices and data streams (or all), the global cluster state or not, and so on. The figure below shows the Snapshot Settings configurations.

Figure : Configuring the snapshot’s settings

Once we’ve picked the configuration settings, the final step is to fill in the retention details. There are three (all optional) pieces of information that we can provide so the snapshots get cleaned up as per the retention policy. The figure below shows the settings that we can select when creating this retention policy.

Figure : Configuring the snapshot’s retention settings

In the figure, we ask the snapshot manager to delete this snapshot after a week (7d). We also mention that we must have at least three snapshots available in our repository so that we never clear the whole lot. This minimum_countsetting ensures that those three snapshots are never deleted, even if they are older than one week. Similarly, the maximum_countsetting ensures that the snapshot copies are no more than the given number (6, in this case), even though they are younger than seven days. Finally, review the options and then create your SLM policy.

Manual execution of SLM

You don’t need to wait until the scheduled time of the policy to kick start the snapshot action. If you have weekly snapshots scheduled on the policy and you may need to take a backup due to a production hotfix, you might want to use manual start. The following listing shows how we can manually execute the SLM policy by invoking the _executeendpoint on the API.

Listing : Manually executing scheduled snapshots

POST _slm/policy/prod_cluster_daily_backups/_execute

Running this command starts the previously created prod_cluster_daily_backupspolicy instantly. We don’t need to wait for its scheduled time.

Searchable snapshots (Enterprise edition)

Elasticsearch introduced a brand new feature called searchable snapshots in version 7.12, which helps run search queries against snapshots. The idea is to use the backups as indices for certain queries. Because the snapshots can be written to low-cost archival stores, using them not just to restore data but mounting them up effectively as indices to run search queries is a big win.

We know that adding replicas to the cluster is one of the ways to improve read performance, but there is a cost associated with this: replicas cost time and money due to the extra space they demand. By making the snapshots mountable (using the _mountAPI), we can make the snapshots available for searches, effectively replacing replicas and, thus, bringing down our costs by almost half.

The searchable snapshots feature is available only for Enterprise licenses and is not for free with the basic license. Hence the reason why we won’t cover them in this book. If you’re interested, check out the documentation for details on how to implement searchable snapshots:

https://www.elastic.co/guide/en/elasticsearch/reference/8.4/searchable-snapshots.html

Elasticsearch almost works by convention over configuration, leaving us with less decisions to make when working with the setup or with running and the operational side of Elasticsearch. However, running the system on defaults is asking for trouble. We must strive to tweak the configuration when needed to provide additional memory or to improve performance and so on.

Mini-series of Snapshotting Feature

  1. Introducing and Registering the Snapshots
  2. Creating and Restoring Snapshots
  3. Automating snapshots

Medium || LinkedIn || Twitter || GitHub