Mini-series of Snapshotting Feature
Automating snapshots
We’ve just seen the mechanism for creating snapshots, but these are ad hoc snapshots: you create them on demand (for example, when you are migrating data, rolling out a production hotfix release, etc.). However, we can automate this process to regularly backup and take snapshots using the snapshot lifecycle management (SLM) feature. Elasticsearch provides the _slm
API to manage the lifecycle of snapshots and helps to create lifecycle policies that get executed on a predefined schedule.
To do this, we need to create a policy using the _slm
API. This policy contains information such as the indices that it should backup, the schedule (cron type of job), retention period, and so on. We must have registered a repository because it is a prerequisite for snapshot lifecycle management.
Let’s say we want to backup all our movie indices at midnight every night to a given repository. We also need to keep these snapshots for a week. We can write a policy and create the automation using the _slm
API as this listing demonstrates.
Listing : Creating a policy for scheduled snapshots
PUT _slm/policy/prod_cluster_daily_backups #A
{
"name":"<prod_daily_backups-{now/d}>",#B
"schedule": "0 0 0 * * ?", #C
"repository": "es_cluster_snapshot_repository", #D
"config": {
"indices":["*movies*", "*reviews*"], #E
"include_global_state": false #F
},
"retention":{
"Expire_after":"7d" #G
}
}
The _slm
API creates a policy that gets stored in the cluster for execution when the schedule kicks in. It has three parts that we must provide: a unique name, the schedule, and the repository we registered earlier to store the snapshots. Let’s understand these bits in detail.
The unique name (<prod_daily_backups-{now/d}>
in the above listing) considers a name constructed with data math in it. The <prod_daily_backups-{now/d}>
is parsed to prod_daily_backups-5.10.2022
if it is run on October 5, 2022 because {now/d}
indicates the current date. Every time the schedule kicks in, the new unique name is generated with the current date; for example, prod_daily_backups-6.10.2022
, prod_daily_backups-7.10.2022
, and so on. Because we are using date math in the name, we must enclose the name using angle brackets (<
and >
) so the parser passes without an issue. For more details, consult the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-date-math-index-names) for further details on date math in names.
As the listing demonstrates, we provided a schedule in the form of a cron job: “schedule”: 0 0 0 * * ?
. This cron expression states that the job is to be executed precisely at midnight. We can therefore expect our snapshot process to begin exactly at midnight every night.
The config
block in the listing consists of the indices and the cluster state that we want to backup (in this example, all movies and review-related indices). If you do not include the config
block, all indices and data streams are considered for snapshot backups automatically by default. The include_global_state
attribute indicates that we want to include the cluster state in the snapshot. In the listing, we ignore the cluster state (include_global_state
is set to false
) as part of the snapshot.
The final piece is the retention information (“retention”:
), which holds the information about how long we want to keep the snapshots in the repository. We set the current snapshot’s lifetime as one week using the expire_after
attribute set to 7d
.
When you execute this query, the automatic snapshot facility remains in place until we delete this policy. It gets executed as per the schedule as expected. This is the easier and preferred way of getting the whole cluster backed up without any manual intervention.
SLM using Kibana
We can create the snapshot lifecycle management policy using Kibana too. Let me show you briefly how we can do this.
Head over to Kibana and navigate to the Management > Snapshot and Restore feature link. Click the Policies tab and invoke the creation of the new policy by clicking the Create Policy button. Fill in the details of this page as in the figure below.
After you complete this initial settings page, navigate to the next page by clicking the Next button. Here, you’ll fill in the details related to the config
block of the query we looked at in the previous listing. This includes any specific indices and data streams (or all), the global cluster state or not, and so on. The figure below shows the Snapshot Settings configurations.
Once we’ve picked the configuration settings, the final step is to fill in the retention details. There are three (all optional) pieces of information that we can provide so the snapshots get cleaned up as per the retention policy. The figure below shows the settings that we can select when creating this retention policy.
In the figure, we ask the snapshot manager to delete this snapshot after a week (7d). We also mention that we must have at least three snapshots available in our repository so that we never clear the whole lot. This minimum_count
setting ensures that those three snapshots are never deleted, even if they are older than one week. Similarly, the maximum_count
setting ensures that the snapshot copies are no more than the given number (6, in this case), even though they are younger than seven days. Finally, review the options and then create your SLM policy.
Manual execution of SLM
You don’t need to wait until the scheduled time of the policy to kick start the snapshot action. If you have weekly snapshots scheduled on the policy and you may need to take a backup due to a production hotfix, you might want to use manual start. The following listing shows how we can manually execute the SLM policy by invoking the _execute
endpoint on the API.
Listing : Manually executing scheduled snapshots
POST _slm/policy/prod_cluster_daily_backups/_execute
Running this command starts the previously created prod_cluster_daily_backups
policy instantly. We don’t need to wait for its scheduled time.
Searchable snapshots (Enterprise edition)
Elasticsearch introduced a brand new feature called searchable snapshots in version 7.12, which helps run search queries against snapshots. The idea is to use the backups as indices for certain queries. Because the snapshots can be written to low-cost archival stores, using them not just to restore data but mounting them up effectively as indices to run search queries is a big win.
We know that adding replicas to the cluster is one of the ways to improve read performance, but there is a cost associated with this: replicas cost time and money due to the extra space they demand. By making the snapshots mountable (using the _mount
API), we can make the snapshots available for searches, effectively replacing replicas and, thus, bringing down our costs by almost half.
The searchable snapshots feature is available only for Enterprise licenses and is not for free with the basic license. Hence the reason why we won’t cover them in this book. If you’re interested, check out the documentation for details on how to implement searchable snapshots:
Elasticsearch almost works by convention over configuration, leaving us with less decisions to make when working with the setup or with running and the operational side of Elasticsearch. However, running the system on defaults is asking for trouble. We must strive to tweak the configuration when needed to provide additional memory or to improve performance and so on.
Mini-series of Snapshotting Feature