Configurable and lightweight backup utility with deduplication and encryption.
Project description
Replicat
Configurable and lightweight backup utility with deduplication and encryption.
Reasoning
For various reasons, I wasn't 100% happy with any of the similar projects that I've tried. It's likely that I will never be 100% happy with Replicat either, but at least it will be easier for me to fix problems or add new features.
Highlights/goals
- efficient, concise, easily auditable implementation
- extendable and configurable
- few external dependencies
- well-documented behaviour
- unified repository layout
- API that exists
This project borrows a few ideas from those other projects, but not enough to be considered a copycat.
Introduction
You can use Replicat to backup files from your machine to a remote location called a repository, located on a backend like local (a local path) or b2 (Backblaze B2). Files are stored in an optionally encrypted and chunked form, and references to chunks (i.e. their digests) are stored in snapshots along with file name and metadata.
Replicat supports two types of repositories: encrypted (the default) and unencrypted.
Chunks, snapshots, and all other pieces of data inside unencrypted repositories are stored unencrypted. The storage names for chunks and snapshots are simply the hash digests of their contents.
Currently, the only supported type of encryption is symmetric encryption. To use symmetric encryption you will need a key and the password associated with that key. A key contains parameters for the KDF and an encrypted section, which can only be decrypted by the owner of the key using the matching password. That section contains secrets for the cryptographic primitives that control how the data is split into chunks, visibility of chunks of data, and more.
You can create multiple keys with different passwords and settings. When adding a new key to a repository with symmetric encryption, you'll have to unlock it with one of the existing keys. You have a choice to either share secrets with the other key OR generate new secrets. Owners of keys with shared secrets ("shared" keys) can use deduplication features together, i.e., chunks of data that were uploaded by the owner of one such key can be accessed and decrypted by the owner of the other key. Assume that they will also be able to check whether you have a specific piece of data. To avoid such risk, you can create a key with new secrets (an "independent" key). That way, Replicat will isolate your data and make it inaccessible to the owners of other keys. Of course, if you use your key to create a yet another (new) key, you will also have the ability to share your secrets with others, even if they were originally copied from some other key. This creates a web of trust of sorts.
In contrast with unencrypted repositories, the storage name for the chunk is derived from the hash digest of its contents and one of the aforementioned secrets, in order to reduce the chance of successful "confirmation of file" attacks. The chunk itself is encrypted with the combination of the hash digest of its contents and another one of those secrets, since the usual convergent encryption is vulnerable to that same "confirmation of file" attack. Table of chunk references inside a snapshot is encrypted similarly, but the list of files that reference those chunks is encrypted using the key and the password that were used to unlock the repository, and therefore can only be decrypted by the owner of that key (even in the case of shared secrets). A snapshot created using an independent key will not be visible. A snapshot created using a shared key will be visible, but there will be no available information about it beyond its storage name and the table of chunk references.
Command line interface
The installer will create the replicat
command (same as python -m replicat
).
There are several available subcommands:
init
-- initializes the repository using the provided settingssnapshot
-- creates a new snapshot in the repositorylist-snapshots
/ls
-- lists snapshotslist-files
/lf
-- lists files across snapshotsrestore
-- restores files from snapshotsadd-key
-- creates a new key for the encrypted repositorydelete
-- deletes snapshots by their namesclean
-- performs garbage collectionupload
-- uploads files to the backend (no chunking, no encryption, keeping original names)
⚠️ WARNING: actions that read from or upload to the repository can safely be run concurrently; however, there are presently no guards in place that would make it safe for you to run destructive actions (
delete
,clean
) concurrently with those actions unless you use independent keys (see the explanation above). I do plan to implement them soon-ish, but in the meantime DO NOT use shared keys (or, naturally, the same key) tosnapshot
andclean
at the same time, for example.As far as the upcoming implementation of such guards, it'll be based on locks. I'm familiar with the lock-free deduplication strategy (like in Duplicacy), but I don't like it much.
There are several command line arguments that are common to all subcommands:
-
-r
/--repository
-- used to specify the type and location of the repository backend. The format is<backend>:<connection string>
, where<backend>
is the name of a module in thereplicat.backends
package. For example:b2:bucket-name
(B2 backend). The<backend>:
part can be omitted for the local destinations (local backend). The<connection string>
part is passed directly to thereplicat.backends.<backend>.Client
class constructor. Ifreplicat.backends.<backend>.Client
expects additional backend-specific arguments, they will appear in the--help
output.replicat.backends
is a namespace package, making it possible to add custom backends without changingreplicat
source code. -
-q
/--hide-progress
-- suppresses progress indication for commands that support it -
-c
/--concurrent
-- the number of concurrent connections to the backend -
-v
/--verbose
-- specifies the logging verbosity. The default verbosity isWARNING
,-v
meansINFO
,-vv
meansDEBUG
.
Encrypted repositories require a key and a matching password for every operation:
-K
/--key-file
-- the path to the key file-p
/--password
-- the password in plaintext. However, it's more secure to provide the password in a file via the-P
/--password-file
argument, or as an environment variableREPLICAT_PASSWORD
.
init
examples
# Unencrypted repository in some/directory. The --encryption none flag disables encryption
$ replicat init -r some/directory --encryption none
# Encrypted repository with initial password taken from string.
# The new key will be printed to stdout
$ replicat init -r some/directory -p 'password string'
# Encrypted repository with initial password taken from a file.
# The new key will be written to path/to/key/file
$ replicat init -r some/directory -P path/to/password/file -o path/to/key/file
# Specifies the cipher
$ replicat init -r some/directory -p '...' --encryption.cipher.name chacha20_poly1305
# Specifies the cipher name and parameters
$ replicat init -r some/directory \
-p '...' \
--encryption.cipher.name aes_gcm \
--encryption.cipher.key_bits 128
# Specifies the KDF name and parameters (for the key)
$ replicat init -r some/directory \
-p '...' \
--encryption.kdf.name scrypt \
--encryption.kdf.n 1048576
# Specifies the chunking parameters
$ replicat init -r some/directory \
-p '...' \
--chunking.min-length 128_000 \
--chunking.max-length 2_048_000
# Equivalent (dashes in argument names are converted to underscores)
$ replicat init -r some/directory \
-p '...' \
--chunking.min_length 128_000 \
--chunking.max_length 2_048_000
snapshot
examples
# Unlocks the repository, uploads provided files in encrypted chunks,
# using no more than 10 concurrent connections, creating a snapshot
$ replicat snapshot -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
-c 10 \
-n 'A note (optional)'
image.jpg some-directory another-directory and/more.text
list-snapshots
/ls
examples
# Unlocks the repository and lists all of the snapshots
$ replicat list-snapshots -r some/directory -P path/to/password/file -K path/to/key/file
# Equivalent
$ replicat ls -r some/directory -P path/to/password/file -K path/to/key/file
list-files
/lf
examples
# Unlocks the repository and lists all versions of all the files
$ replicat list-files -r some/directory -P path/to/password/file -K path/to/key/file
# Equivalent
$ replicat lf -r some/directory -P path/to/password/file -K path/to/key/file
# Only lists files with paths matching the -F regex
$ replicat lf -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
-F '\.(jpg|text)$'
restore
examples
# Unlocks the repository and restores the latest versions of all files to target-directory
$ replicat restore -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
target-directory
# Unlocks the repository and restores the latest versions of files with paths matching the
# -F regex in snapshots matching the -S regex to 'target-directory'
$ replicat restore -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
-F '\.(jpg|text)$' \
-S 'abcdef' \
target-directory
add-key
examples
# Unlocks the repository and creates an independent key, which will be printed to stdout
$ replicat add-key -r some/directory -P path/to/password/file -K path/to/key/file
# Unlocks the repository and creates a shared key (i.e. with shared secrets)
$ replicat add-key -r some/directory -P path/to/password/file -K path/to/key/file --shared
# Unlocks the repository and creates an independent key, which will be written
# to path/to/new/key/file
$ replicat add-key -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
-o path/to/new/key/file
# Unlocks the repository and creates an independent key with some custom settings
# (cipher params as well as chunking and hashing settings are repository-wide)
$ replicat add-key -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
--encryption.kdf.name scrypt \
--encryption.kdf.n 1048576
delete
examples
# Unlocks the repository and deletes snapshots by name (as returned by ls/list-snapshots).
# Chunks that aren't referenced by any other snapshot will be deleted automatically
$ replicat delete -r some/directory \
-P path/to/password/file \
-K path/to/key/file \
NAME1 NAME2 NAME3 ...
clean
examples
# Unlocks the repository and deletes all chunks that are not referenced by any snapshot
$ replicat clean -r some/directory -P path/to/password/file -K path/to/key/file
upload
examples
# Uploads files directly to the backend without any additional processing.
# File path -> resulting name:
# /working/directory/some/file -> some/file
# /working/directory/another/file -> another/file
# /working/directory/another/directory/another-file -> another/directory/another-file
# /absolute/directory/path/with-file -> absolute/directory/path/with-file
# /absolute/file -> absolute/file
/working/directory$ replicat upload -r some:repository \
some/file \
/working/directory/another/directory \
/absolute/directory/path \
/absolute/file
# Uploads files that do not yet exist in the repository (only checks the file names)
$ replicat upload -r some:repository --skip-existing some/file some/directory
Security
If you believe you've found a security issue with replicat, please report it to flwaultah@gmail.com (or DM me on Twitter or Telegram).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for replicat-1.0.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b608a25b1acd414c74b3e974cc1a7e97562a42cd873ce1d84345b520e345a5c0 |
|
MD5 | e7fdb94a93461002b264dfc49bd21edd |
|
BLAKE2b-256 | 7b1434df331560154cd24a164ecf6f09107efc4dac777505aac7c1053db2c347 |
Hashes for replicat-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a227cbb265acd8ee1a3da71f9274c8c6ffa593623a41675629b53d2d89a68b4 |
|
MD5 | f77713571e61f92cb248d5916f85990a |
|
BLAKE2b-256 | 472d67888df1a89c1ca55e177c40aa5810fa98975cfe86ad1ef0e5858de1cf2c |
Hashes for replicat-1.0.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fafc63ada1e08b129936dd5a23636de215203b88e2728bf80019b420a3272a2 |
|
MD5 | ac7dbef1c7c7fb14f4395fd7269107de |
|
BLAKE2b-256 | efefd82a08e19551a592c93af4ffeb8207fce5d86d3b45a73890e248c43d4263 |
Hashes for replicat-1.0.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b386f27247deb09bc96bb1c2475533bac3b89fd8a0ca8496df3bdbb0ee9d175 |
|
MD5 | 0aaa6a4e78a6fd09389acd2f5ce33555 |
|
BLAKE2b-256 | 7be0968fcc2a41715d56f6a0781d98325eda7b0ed4aae8607c139caf2f6bed67 |
Hashes for replicat-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 754e7b6e821e060df99ea5a8efbad6fd53aa381d29be2fa6a2c7abffd2d0f897 |
|
MD5 | e89357cc8d1852ee7bd9576bd2c888e7 |
|
BLAKE2b-256 | 826ad1d8f5a10aca912acb39ee107e72d6e314d484a010bdfa703f4301b7ec22 |
Hashes for replicat-1.0.0-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f974ff55b2fc244c48a9c1a13433102ba8a200ddb260c2f38ddab1b8a6b47cce |
|
MD5 | b2d00905c2ce1095e7b877497240a5ee |
|
BLAKE2b-256 | 91bcd46582b6dacb1c5855e3d35a191efa4b6da1107e2a5c4ae90eb9d1f05882 |