Introducing and Registering snapshots (1/3)

Mini-series of Snapshotting Feature

  1. Introducing and Registering the Snapshots
  2. Creating and Restoring Snapshots
  3. Automating snapshots

Introduction to Snapshotting

Running applications in production without backup and restore functionality is risky. Data in our clusters is expected to be stored in a durable storage somewhere off cluster. Fortunately, Elasticsearch provides an easy snapshot and restore functionality for backing up our data and restoring it when needed.

Snapshots help to store incremental backups on a regular basis. We can store the snapshots in a repository usually mounted on a local filesystem or on a cloud-based service such as AWS S3, Microsoft Azure, or Google Cloud Platform. As the figure below shows, administrators snapshot clusters on a timely basis to a storage medium and then execute restore on demand.

Figure : Mechanics of snapshot and restore on a cluster

A regular cluster snapshot is an administrative task that ideally should be automated with in-house scripts or handy tools. Before we start backing up the snapshots, however, we must make sure our snapshot repository is of a chosen type and is registered. In this section, we’ll discuss the mechanics of setting up a repository and how to snapshot and restore the data from the repository to a cluster.

Getting started

There are a couple of steps that are required before we start the snapshot and restore functionality. Broadly speaking, we need to perform the following three activities to get started with this functionality in our organization:

  • Register a snapshot repository — Snapshots are stored on durable storage areas such as a filesystem, a Hadoop distributed file system (hdfs), or a cloud storage such as Amazon’s S3 buckets.
  • Snapshot the data — Once we have a repository registered with the cluster, we can snapshot the data to back it up.
  • Restore from the store — When we need to restore the data, we simply pick the index, set of indices, or full cluster that needs to be restored and start the restore operation from the previously registered snapshot repository.

As part of snapshotting, all indices, all data streams, and the whole cluster state are backed up. Note that, once the first snapshot is performed, the backups from then on will be incremental updates, not full copies. There are two ways you can work with snapshots:

  • Snapshot and restore RESTful APIs
  • The snapshot and restore feature of Kibana

Before we begin, the first step is to choose a repository type and to register it. Let’s use both ways to register a snapshot repository, discussed in the next section.

Registering a snapshot repository

To keep things simple, let’s pick the filesystem as our repository type: we want to store our snapshots on a disk mounted on a shared file system. The first step is to mount the filesystem with available memory on all the master and data nodes in the cluster. Once the server has this filesystem mounted, we then need to let Elasticsearch know the location of it by specifying that in the configuration file.

Edit the elasticsearch.yml configuration file to amend the path.repo property, pointing it to the location of the mount. For example, if the mount path is /volumes/es_snapshots, the path.repo would look like this: path.repo: /volumes/es_snapshots. Once we have added the mount path, we need to restart the respective nodes for this mount to be available for the nodes.

Registering the repository using snapshot APIs

When the nodes are back online after the restart, the final step is to invoke the snapshot repository API. The following listing shows the code for invoking this API.

Listing : Registering a filesystem-based snapshot repository

PUT _snapshot/es_cluster_snapshot_repository 
{
"type": "fs",
"settings": {
"location": "/volumes/es_snapshots"
}
}

Elasticsearch provides the _snapshot API for carrying out actions related to snapshots and restores. In the code in the above given listing , we create a snapshot repository called es_cluster_snapshot_repository. The body of the request expects the type of the repository we are creating and the respective properties that are required to set the repository type. In our example, we set fs (for filesystem) as our repository type and provide the filesystem path as the location in the settings object.

Because we’ve already added the mount point in the configuration file and, of course, restarted the node, the code in the listing should execute successfully to register our first repository. Issuing a GET _snapshot command returns the registered snapshot as the following snippet shows:

{
"es_cluster_snapshot_repository" : {
"type" : "fs",
"settings" : {
"location" : "/volumes/es_snapshots"
}
}
}

The response shows that we have one snapshot repository registered and available for our snapshots.

Note: If you are running Elasticsearch on your local machine, you can set a temp folder as your repository location. For example, you can use /tmp/es_snapshots for *nix-based operating systems or c:/temp/es_snapshots for Windows.

As mentioned, we can use Kibana’s Console for working with the snapshot and restore feature. As with the APIs, we can register the repository too. Although delving into all of the details of working with Kibana is out of the scope of this article, I’ll provide a few pointers so you can work with the snapshot and restore functionality on Kibana.

Registering a snapshot repository on Kibana

Kibana has extensive support for working with the snapshot and restore feature: from registering the snapshots to executing the snapshots, as well as restoring them. Let’s first check out how we can register the repository on Kibana.

Head off to the top-left menu of the Kibana console and expand the Management menu, where you’ll see along with our famous Dev Tools, a Stack Management navigation link. The figure below reveals this link on the Kibana console.

Figure : Accessing the Stack Management page for snapshot functionality

Once you click and navigate to the Stack Management page, you’d find a Snapshot and Restore menu item under Data. You’ll see a page that provides current repositories, snapshots, and their states. Go to the second tab, Repositories, and click Register a Repository. You’ll see a page similar to the one figure below shows.

Figure : Naming the repository and choosing its type

Naming the repository is mandatory, followed by choosing a type: shared file system (fs), AWS S3, Azure’s blob storage, and so on. Going with the same example of choosing a local filesystem as our repository, let’s click the Shared file system to navigate to the next page.

The next page asks us to input the filesystem location as well as a bunch of other properties (maximum and minimum bytes per second for snapshots and restores, chunk size, etc.). Click the Register button at the bottom of the page to create this repository. That’s pretty much it for creating the filesystem-based repository.

In the next article, we will learn about creating and restoring the snapshot functionality.

Mini-series of Snapshotting Feature

  1. Introducing and Registering the Snapshots
  2. Creating and Restoring Snapshots
  3. Automating snapshots

Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action