Cloudogu Logo

Hello, we are Cloudogu!

Experts in Software Lifecycle Management and process auto­mation, supporter of open source soft­ware and developer of the Cloudogu EcoSystem.

featured image For those who have gone through the pain ...
11/26/2019 in Technology

For those who have gone through the pain ...


Boris Schaa
Boris Schaa

Senior Software Developer


How to recover data with Restic

In the first part of this article series, we described how you can easily and quickly make backups using containers with Restic. However, backing up data is not an end in itself. Rather, it is about recovering data if the backed-up system fails. This article is dedicated to this aspect of backup, which is just as simple when using Docker as recovering your previous backup. Finally, we will also cover how to delete backups if you run out of storage space.

This article assumes that Restic has already been installed and snapshots are backed up with Restic. You can find additional installation instructions in the first article of this series.

Restic commands that are covered in this article

In this article, we are interested in the following commands in particular:

  • snapshots to view existing backups
  • restore to restore files and directories
  • forget to delete backups

To review: Restic can provide you with more information on which arguments are accepted at each command level using the --help argument, e.g., restic --help or restic backup --help. This article assumes the use of root privileges via sudo su in order to avoid any problems with file permissions.

Restoring from a backup

After successfully creating backups in a Restic repository, the main purpose of backups is for: Restoring backed up data. If the container is still working, the content can be restored gracefully across the volume. In order to illustrate how this works, the content of the Docker volume is deleted in order to artificially induce a fault.

rm /var/lib/docker/volumes/nginxData/_data/

If the page is accessed in the browser at http://localhost:8080, then you should see an error message. Now we can restore the data. However, the rules for open files when performing a restore are analogous to those that are applied to backups. This means that the logical or technical dependencies between containers must also be followed in order to perform a restore:

  1. Stop the container
  2. Restore the volume(s)
  3. Start the container

Before restoring, make sure that the previously used Minio server has also been started, since it will hold the backups from the first part of this series for testing purposes:

docker start minio

The most reliable way to select a backup to restore is to transfer the desired snapshot ID. This can be determined via restic snapshots or from the output of restic backup. Since the saved path is an absolute path (i.e., from / downwards), the --target-Option must also be set to /. If you were to use the complete path /var/lib/docker/volumes/nginxData/_data/index.html here, then the entire directory tree would be inserted recursively into the _data/ directory. First, it should be clarified which snapshots are available:

restic snapshots
repository 00d7d2bb opened successfully, password is correct
ID Time Host Tags Paths
--------------------------------------------------------------------------------
----------------------------
aed06d2f 2019-04-05 13:00:01 MY-HOST-1337 Complete backup KW15
/var/lib/docker/volumes/nginxData/_data
d7e6092d 2019-04-12 15:52:32 MY-HOST-1337 Complete backup KW16
/var/lib/docker/volumes/nginxData/_data
--------------------------------------------------------------------------------

ID d7e6092d from the listing is now used for the specific restore. The command to restore the volume contents looks like this:

docker stop prod-nginx
prod-nginx
restic restore d7e6092d --target /
docker start prod-nginx
prod-nginx

Alternatively, you can use the command restic restore latest, which uses the latest snapshot to perform the restore. However, this is not recommended, because this case is stored quite similarly to container images with the latest tag: If there are different backups of different volumes, then it is not clear what data is actually in the most recently backed up snapshot. In principle, you have the option of including a search path, but this is contrary to the principle of ease of use.

Since caution is usually required with production data, it is advisable to include the --verify argument when performing a restore:

restic restore d7e6092d --target / --verify

This should be understood to constitute an additional security measure. Restic reconciles the recovered data with the data from the backup repository.

Deleting backups

When you possess this knowledge, you can work more intelligently in the field of backup and recovery. However, when you are backing up data to your own disks, experience shows that storage space runs out rather quickly. If, on the other hand, you are backing up data to the cloud, then depending on the storage approach and the contracts that you have concluded, you may have access to what essentially amounts to an infinite amount of storage space. In this case it can be quite desirable to reduce the list of snapshots to a manageable size.

For these cases, Restic also provides a special feature, namely restic forget. Before looking into the specific features of restic forget, it is a good idea to take a look at the way Restic works. In order to ensure speed, Restic works intensively with references and hashes in addition to using encryption. Before each transfer, the hash of the part being backed up is calculated. If the hash shows that this part already exists, then it will no longer be transferred but only referenced. This deduplication saves both time and storage space. For inquisitive people, this is how it works: The restic stats --mode raw-data command indicates the actual amount of storage space that used by the backup repository.

Currently, if you remove snapshots from the backup repository, they will disappear from the overview, but they still take up space on the hard disk. This is because finding unreferenced data takes time. Restic offers two alternatives to actually free up space. You can do this by using either a separate restic prune command or the parameter restic forget --prune.

The easiest way to remove backups is to use snapshot IDs. Because here you will not run into any potential future situations in which backups refuse to work. For example, this command removes three snapshots specified and frees up data on the hard disk.

restic forget 40dc1520 79766175 590c8fc8 –prune

Policies

If you have set up an automated backup, it is a common practice to automate the rotation of old backups as well, e.g., ones in which you want to hold only a certain number of backups for a certain interval of time. In that case, it may be impractical to use snapshot IDs. An alternative to using snapshot IDs is so-called policies, which provide you with the ability to select snapshots that should not be removed based on criteria.

In practice, it is helpful to try the parameter restic forget --dry-run to see the effect without fear of data loss. Restic goes a long way to avoid accidental data loss. If a policy combination results in a situation in which all snapshots are deleted, Restic will not follow this policy, and it will not delete any snapshots, like in the following example.

restic forget --keep-last 0 --prune
repository 8460094c opened successfully, password is correct
no policy was specified, no snapshots will be removed

A simple policy is provided by the --keep-last parameter, which holds the number of the most recent backups that have been transferred. This example retains the last three snapshots of each path:

restic forget --keep-last 3 –prune

In addition, there are a number of alternatives to narrow down the selection of snapshots that will be retained. For example, there is --keep-hourly, which performs a number of hourly snapshots of the same file path. There are also equivalents on the daily, weekly and annual level.

There are two other interesting policy parameters that differ from the other time-based parameters. While --keep-tag retains the snapshots with a given tag, it is possible to specify --keep-within {duration} to save snapshots for a defined period lasting until the latest snapshot. For example, this example retains all snapshots taken in the past 2 years, 5 months, 7 days, and 3 hours before the latest snapshot:

restic forget --keep-within 2y5m7d3h –forget

Policy modules

All Restic policies can be easily combined. This can be achieved by repeating a parameter. If, for example, if you want Restic to keep one snapshot per month, week, and day, you can specify this very elegantly using the following parameters:

restic forget --keep-daily 1 --keep-weekly 1 --keep-monthly 1 –prune

It is worth taking a look at the documentation, especially if the policy is augmented with tag lists, which we are not able to discuss in further detail here.

Summary

Restic is a powerful tool that solves many important aspects of backup and restore, especially since it offers ease of use combined with speed and security. It can be a bit of a hassle in more complex cases to delete snapshots with policies, but there is always the method of using snapshot IDs. You can rely on this method as a backup, so to speak.

Cloudogu Platform Logo

Visit our community platform to share your ideas with us, download resources and access our trainings.

Join us now
Cloudogu Platform Logo