Skip to main content

Block-based backup and restore utility for virtual machine images

Project description

Overview

Backy is a block-based backup and restore utility for virtual machine images.

Backy is intended to be:

  • space-, time-, and network-efficient

  • trivial to restore

  • reliable.

To achieve this, we rely on:

  • using a copy-on-write filesystem (btrfs, ZFS) as the target filesystem to achieve space-efficiency,

  • using a snapshot-capable main storage for our volumes (e.g. Ceph, LVM, …) that allows easy extraction of changes between snapshots,

  • leverage proven, existing low-level tools,

  • keep the code-base small, simple, and well-tested.

Operations

Full restore

The most important question is: I screwed up – how do I get my data back?

Here’s the fast answer to make a full restore of the most recent backup:

$ cd /srv/backy/my-virtual-machine
$ dd if=last of=/srv/kvm/my-virtual-machine.img bs=4096k

In case Ceph is used for image storage, just import the image directly:

$ rbd import last rbd/my-volume

If you like to pick a specific version, it’s only a little more effort:

$ backy status
+---------------------+------------+------------+---------+--------------+
| Date                | ID         |       Size |   Durat | Tags         |
+---------------------+------------+------------+---------+--------------+
| 2015-11-04 11:09:26 | UT7PkENubw |  60.00 GiB | 845.0 s | weekly,daily |
| 2015-11-05 10:32:03 | fPnbSvEHHy | 264.85 MiB |  88.1 s | daily        |
| 2015-11-06 10:32:03 | cErS5GJ5sL | 172.34 MiB |  84.5 s | daily        |
+---------------------+------------+------------+---------+--------------+
3 revisions containing 60.43 GiB data (estimated)
$ dd if=fPnbSvEHHymfztN9FuegLQ of=/srv/kvm/my-virtual-machine bs=4096k

Or try backy find to locate the latest backup for a specific tag:

$ rbd import $(backy find -r weekly) rbd/my-volume

Restoring individual files

The image files are exact copies of the data from the virtual disks. You can use regular Linux tools to interact with them:

$ kpartx -av last
add map loop0p1 (253:9): 0 41934815 linear /dev/loop0 8192
$ mkdir /mnt/restore
$ mount -o ro /dev/mapper/loop0p1 /mnt/restore
$ cd /mnt/restore
$ ls
bin  boot  dev  etc  home  lib  lost+found  media  mnt  opt  proc  root  run
sbin  srv  sys  tmp  usr  var

To clean up:

$ umount /mnt/restore
$ kpartx -d last

Setting up backy

  1. Create a sufficiently large backup partition using a COW-capable filesystem like btrfs and mount it under /srv/backy.

  2. Create a configuration file at /etc/backy.conf. See man page for details.

  3. Start the scheduler with your favourite init system:

    backy -l /var/log/backy.log scheduler -c /path/to/backy.conf

    The scheduler runs in the foreground until it is shot by SIGTERM.

  4. Set up monitoring using backy check.

  5. Set up log rotation for /var/log/backy.conf and /srv/backy/*/backy.log.

The file paths given above match the built-in defaults, but paths are fully configurable.

Features

Telnet shell

Telnet into localhost port 6023 to get an interactive console. The console can currently be used to inspect the scheduler’s live status.

Self-check

Backy includes a self-checking facility. Invoke backy check to see if there is a recent revision present for all configured backup jobs:

$ backy check
OK: 9 jobs within SLA

Both output and exit code are suited for processing with Nagios-compatible monitoring systems.

Pluggable backup sources

Backy comes with a number of plug-ins which define block-file like sources:

  • file extracts data from simple image files living on a regular file system.

  • ceph-rbd pulls data from RBD images using Ceph features like snapshots.

  • flyingcircus is an extension to the ceph-rbd source which we use internally on the Flying Circus hosting platform. It uses advanced features like Consul integration.

It should be easy to write plug-ins for additional sources.

Adaptive verification

Backy always verifies freshly created backups. Verification scale depends on the source type: file-based sources get fully verified. Ceph-based sources are verified based on random samples for runtime reasons.

Zero-configuration scheduling

The backy scheduler is intended to run continuously. It will spread jobs according to the configured run intervals over the day. After resuming from an interruption, it will reschedule missed jobs so that SLAs are still kept if possible.

Backup jobs can be triggered at specific times as well: just invoke backy backup manually.

Authors

License

GPLv3

Changelog

2.1.5 (2016-07-01)

  • Bugfix release: fix data corruption bug in the new full-always mode. (FC #21963)

2.1.4 (2016-06-20)

  • Add “full-always” flag to Ceph and Flyingcircus sources. (FC #21960)

  • Rewrite full backup code to make use of shallow copies to conserve disk space. (FC #21960)

2.1.3 (2016-06-09)

  • Fix new timeout to be 5 minutes by default, not 5 days.

  • Do not sort blocks any longer: we do not win much from seeking over volumes with random blocks anyway and this helps for a more even distribution with the new timeout over multiple runs.

2.1.2 (2016-06-09)

  • Fix backup of images containing holes (#33).

  • Introduce a (short) timeout for partial image verification. Especially very large images and images that are backed up frequently do not profit from running for hours to verify them, blocking further backups. (FC #21879)

2.1.1 (2016-01-15)

  • Fix logging bugs.

  • Shut down daemon loop cleanly on signal reception.

2.1 (2016-01-08)

  • Add optional regex filter to the jobs command in the telnet shell.

  • Provide list of failed jobs in check output, not only the total number.

  • Add status-interval, telnet-addrs, and telnet-port configuration options.

  • Automatically recover from missing/damaged last or last.rev symlinks (#19532).

  • Use {BASE_DIR}/.lock as daemon lock file instead of the status file.

  • Usability improvements: count jobs, more informative log output.

  • Support restoring to block special files like LVM volumes (#31).

2.0 (2015-11-06)

  • backy now accepts a -l option to specify a log file. If no such option is given, it logs to stdout.

  • Add backy find -r REVISION subcommand to query image paths from shell scripts.

  • Fix monitoring bug where partially written images made the check go green (#30).

  • Greatly improve error handling and detection of failed jobs.

  • Performance improvement: turn off line buffering in bulk file operations (#20).

  • The scheduler reports child failures (exit status > 0) now in the main log.

  • Fix fallocate() behaviour on 32 bit systems.

  • The flyingcircus source type now requires 3 arguments: vm, pool, image.

2.0b3 (2015-10-02)

  • Improve telnet console.

  • Provide Nix build script.

  • Generate requirements.txt automatically from buildout’s versions.cfg.

2.0b2 (2015-09-15)

  • Introduce scheduler and rework the main backup command. The backy command is now only responsible for dealing with individual backups.

    It does no longer care about scheduling.

    A new daemon and a central configuration file is responsible for that now. However, it simply calls out to the existing backy command so we can still manually interact with the system even if we do not use the daemon.

  • Add consul integration for backing up Flying Circus root disk images with clean snapshots (by asking fc.qemu to use fs-freeze before preparing a Ceph snapshot).

  • Switch to shorter UUIDs. Existing files with old UUIDs are compatible.

  • Turn the configuration format into YAML. Old files are still compatible. New configs will be generated as YAML.

  • Performance: defrag all new files automatically to avoid btrfs degrading extent performance. It appears this doesn’t completely duplicate all CoW data. Will have to monitor this in the future.

2.0b1 (2014-07-07)

  • Clean up docs.

  • Add classifiers in setup.py.

  • More or less complete rewrite expecting a copy-on-write filesystem as the target.

  • Flexible backup scheduling using free-form tags.

  • Compatible with Python 3.2-3.4.

  • Initial open source import as provided by Daniel Kraft (D9T).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

backy-2.1.5.tar.gz (50.8 kB view details)

Uploaded Source

File details

Details for the file backy-2.1.5.tar.gz.

File metadata

  • Download URL: backy-2.1.5.tar.gz
  • Upload date:
  • Size: 50.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for backy-2.1.5.tar.gz
Algorithm Hash digest
SHA256 4c9b5c8402ad5615f4ed64f74994dbe67c9851db6a5c2dbb7f9c4b0a4dcc07bf
MD5 cc61fe2f53890c496a1deb0b61ddbd3c
BLAKE2b-256 03a6c8e32ce50405bfd188ec522d202f7214321d2f7ba0e2333633d0645f1225

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page