Skip to main content

A backup manager which can back up a large set of files to a number of offline disks

Project description

Distbackup

Distbackup is a tool that can back up a large set of files to a number of offline disks. Only one backup disk needs to be connected at a time for distbackup to operate on it. Distbackup will attempt to distribute redundant copies of files across the backup disks, as long as there is room.

Installation

Distbackup can be installed using PIP:

python3 -m pip install distbackup

If it fails, make sure you have Python, PIP, and OpenSSL development libraries installed. On Ubuntu, run

sudo apt install python3-pip python3-dev python3-wheel libssl-dev

Distbackup uses argcomplete to assist with tab completion in your shell. To enable it (for bash), run:

mkdir -p ~/.local/share/bash-completion/completions/
register-python-argcomplete --shell bash dsb > ~/.local/share/bash-completion/completions/dsb

Argcomplete also supports tcsh and fish. Run register-python-argcomplete --help for more info.

How it works

Distbackup maintains a database (SQLite) which contains metadata about every file in the source set, including its name, last modified date, and a SHA256 hash of its contents. The hash is then used to uniquely identify an "object", which is then copied to one or more backup disks, along with a copy of the database itself. The files on this backup disk are stored only by their hash, so a restore requires reading the database to reconstruct the original file structure.

The file structure on the backup disk looks like this:

├── distbackup-disk.json
├── distbackup.sqlite
├── distbackup-objects
│   ├── 000
│   │   ├── 0005e3d2a9216a465148b424de67297ad5ce65b95289294f3ef53c856ca55088
│   │   └── 000c00bad31d126b054c6ec7f3e02b27c0f9a4d579f987d3c4f879cee1bacb81
│   ├── 004
│   │   ├── 0046066f500854ebc1eb5d679a7164235de42efdf4dfbacff70d9bdb5a2d65db
│   │   └── 004cf775fda2783974afc1599c33b77228f04f7c053760f4a9552927207a064e
│   ├── 007
│   │   ├── 00702164a628a9e65266f4aafec2e1faebc42f0cc2145408a74c3feae39bef6d
│   │   └── 0077c553ae28326ef59c06e3743a6ddf5e046d9482eb9becfa8e06ff5bd37e2e
│   ├── 008
│   │   └── 0083cc2e1d1d989795d02aa47d4dd42b9f90b644d025cece0ab3c953b3a4fa09
.
.
.

Since objects are identified by their hash, their contents are immutable. This means that if a source file changes, it will have a new hash and therefore refer to a new object.

Distbackup works with one backup disk at a time. First, it will delete any orphaned objects (i.e. files that have been deleted or changed on the source), then it will copy any new objects to the disk, and then, if it still has room, it will try to make redundant copies of objects that have already been copied to other disks. It may delete redundant objects off the disk to make room, as long as the overall redundancy is not reduced. For example, if there is an object which already has one copy on another disk, it may decide to delete an object that has two copies on other disks to make room for a second copy of the first object.

Getting started

All distbackup commands are accessed via dsb. You can run dsb --help at any time to get a list of commands.

Distbackup keeps its data in an SQLite database stored in ~/.config/distbackup/. If you want to use a different path, you can set the DISTBACKUP_PATH environment variable or override it with -d. The first time you run distbackup, it will create the database automatically.

First, you need to decide what files you want to back up. You can back up a single folder or multiple folders spread out across different drives. You can "mount" a folder as a specific path under the virtual tree with the dsb source map command:

# Make /media/photos/DCIM appear as /photos in the backup set
dsb source map /photos /media/photos/DCIM

# Another path, from your home folder
dsb source map /videos ~/Videos/recorded/

# You can "mount" directories within other virtual directories.
dsb source map /videos/stream-archive ~/livestream-archive

Once you have your source map set up the way you want it, run dsb update to scan the source folders for files and record their metadata in the database. The first time you run it, it will have to read every file to generate a hash. You can let this process run in the background while you continue setup. If the hashing process is interrupted, it will pick up where it left off the next time you run dsb update and won't have to rehash any files unless their contents have changed.

Next, you need to find some disks to use as a backup. Each disk needs to have a unique name, even if it's just distbackup-01, distbackup-02, etc. I highly recommend physically labelling each disk with its name so it's easy to find. I also recommend setting the volume label on the disk as well, though distbackup does not require that.

Note: If your disks are formatted as ext4, you should set the "reserved blocks" to zero. By default, ext4 reserves 5% for the root user, which for a 6TB disk is 300GB, an insane amount for a data disk. You can use tune2fs -m 0 to clear it.

Once you have your disk connected, formatted, and mounted, it's time to make distbackup aware of it:

ktpanda@desktop:~$ dsb disk add distbackup-01 /media/distbackup-01/distbackup/
Added disk distbackup-01:
  UUID: db74b831-bd09-434f-ac9b-bc427dfc5628
  Nexus index: 0
  Size: 7,927,384,932,352
  Mount point (current): /media/distbackup-01
  Relative path from mountpoint: distbackup

The size, if not specified, defaults to 10GB less than the total size of the disk. If you want to set a specific size (e.g. to reserve more space for other files), you can specify --size when adding the disk or with the dsb disk set command:

dsb disk set distbackup-01 --size 7.4T

Once you have all your disks set up, and dsb update completes, you're ready to start backing up! Just run dsb backup distbackup-01, and it will start copying files until it hits the size limit of the disk, or it runs out of files to copy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distbackup-1.0.20.tar.gz (44.5 kB view hashes)

Uploaded Source

Built Distribution

distbackup-1.0.20-py3-none-any.whl (47.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page