Backup and restore for block devices.
Backy is a block-based backup utility for virtual machines (i.e. volume files).
Backy is intended to be:
- space-, time-, and network-efficient
- trivial to restore
To achieve this, we rely on:
- using a copy-on-write filesystem (btrfs, ZFS) as the target filesystem to achieve space-efficiency,
- using a snapshot-capable main storage for our volumes (e.g. Ceph, LVM, …) that allows easy extraction of changes between snapshots,
- leverage proven, existing low-level tools,
- keep the code-base small, simple, and well-tested.
The most important question is: I screwed up – how do I get my data back?
Here’s the fast answer to make a full restore of the most recent backup:
$ cd /srv/backy/my-virtual-machine $ dd if=last of=/srv/kvm/my-virtual-machine.img bs=4096k
If you like to pick a specific version, it’s only a little more effort:
$ cd /srv/backy/my-virtual-machine $ backy status +---------------------+------------+------------+---------+--------------+ | Date | ID | Size | Durat | Tags | +---------------------+------------+------------+---------+--------------+ | 2015-11-04 11:09:26 | UT7PkENubw | 60.00 GiB | 845.0 s | weekly,daily | | 2015-11-05 10:32:03 | fPnbSvEHHy | 264.85 MiB | 88.1 s | daily | | 2015-11-06 10:32:03 | cErS5GJ5sL | 172.34 MiB | 84.5 s | daily | +---------------------+------------+------------+---------+--------------+ == Summary 3 revisions 60.43 GiB data (estimated) $ dd if=fPnbSvEHHymfztN9FuegLQ of=/srv/kvm/my-virtual-machine bs=4096k
Restoring individual files
The image files are exact copies of the data from the virtual disks. You can use regular Linux tools to interact with them:
$ cd /srv/backy/my-virtual-machine $ ls -l last lrwxrwxrwx 1 root root 36 Apr 25 10:13 last -> cErS5GJ5sLdsk9L6oCs4ia $ kpartx -av cErS5GJ5sLdsk9L6oCs4ia add map loop0p1 (253:9): 0 41934815 linear /dev/loop0 8192 $ mkdir /root/restore $ mount -o ro /dev/mapper/loop0p1 /root/restore $ cd /root/restore $ ls bin boot dev etc home lib lost+found media mnt opt proc root run sbin srv sys tmp usr var
To clean up:
$ umount /root/restore $ kpartx -d cErS5GJ5sLdsk9L6oCs4ia
Create a configuration file (see man page for details). Spawn the scheduler with your favourite init system:
backy scheduler -c /path/to/backy.conf
The scheduler runs in the foreground until it is shot by SIGTERM. On resume, the scheduler re-runs missed backup jobs to some degree.
Log output goes to backy.log in the current directory by default.
Telnet into localhost port 6023 to get an interactive console. The console can currently be used to inspect the scheduler’s live status.
Backy includes a self-checking facility. Invoke backy check to see if there is a recent revision present for all configured backup jobs:
$ backy check OK: 9 jobs within SLA
Both output and exit code are suited for processing with Nagios-compatible monitoring systems.
Pluggable backup sources
Backy comes with a number of plug-ins which define block-file like sources:
- file extracts data from simple image files living on a regular file system.
- ceph-rbd pulls data from RBD images using Ceph features like snapshots.
- flyingcircus is an extension to the ceph-rbd source which we use internally on the Flying Circus hosting platform. It uses advanced features like Consul integration.
It should be easy to write plug-ins for additional sources.
- backy now accepts a -l option to specify a log file. If no such option is given, it logs to stdout.
- Add backy find -r REVISION subcommand to query image paths from shell scripts.
- Fix monitoring bug where partially written images made the check go green (#30).
- Greatly improve error handling and detection of failed jobs.
- Performance improvement: turn off line buffering in bulk file operations (#20).
- The scheduler reports child failures (exit status > 0) now in the main log.
- Fix fallocate() behaviour on 32 bit systems.
- The flyingcircus source type now requires 3 arguments: vm, pool, image.
- Improve telnet console.
- Provide Nix build script.
- Generate requirements.txt automatically from buildout’s versions.cfg.
Introduce scheduler and rework the main backup command. The backy command is now only responsible for dealing with individual backups.
It does no longer care about scheduling.
A new daemon and a central configuration file is responsible for that now. However, it simply calls out to the existing backy command so we can still manually interact with the system even if we do not use the daemon.
Add consul integration for backing up Flying Circus root disk images with clean snapshots (by asking fc.qemu to use fs-freeze before preparing a Ceph snapshot).
Switch to shorter UUIDs. Existing files with old UUIDs are compatible.
Turn the configuration format into YAML. Old files are still compatible. New configs will be generated as YAML.
Performance: defrag all new files automatically to avoid btrfs degrading extent performance. It appears this doesn’t completely duplicate all CoW data. Will have to monitor this in the future.
- Clean up docs.
- Add classifiers in setup.py.
- More or less complete rewrite expecting a copy-on-write filesystem as the target.
- Flexible backup scheduling using free-form tags.
- Compatible with Python 3.2-3.4.
- Initial open source import as provided by Daniel Kraft (D9T).
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size backy-2.0.tar.gz (43.2 kB)||File type Source||Python version None||Upload date||Hashes View|
|Filename, size backy-2.0.zip (64.7 kB)||File type Source||Python version None||Upload date||Hashes View|