A flexible, Unix-philosophy backup utility saving to AWS S3 WORM storage

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

s3-sysbackup Package

A very flexible, Unix-style facility for backing content up to AWS S3 with time/space folding, using the S3 Object Lock system for write-once-read-many (WORM) storage, which can help provide additional security or compliance around backup requirements.

S3 charges not only for storage, but also for transfer of data into S3. s3-sysbackup works to minimize both kinds of charges, including a limited ability to amortize backup content storage over multiple machines.

Configuration

s3-sysbackup configuration lives in a directory structure on disk, with the default location being /etc/backup. Within the directory there must be a conf file, which is in Python configparser format. Backup strategies, as detailed below, each have their own subdirectory of the configuration directory.

The sections of the conf file are detailed in the subsections below.

`conf` File: `[aws]` Section

This section gives configuration information for access to the AWS API. The available settings are:

Setting	Value Description
`profile`	Sets the profile to use -- may be from the standard profile system or from a custom credentials file specified with the `credentials_file` setting
`credentials_file`	Path to an AWS credentials file to use
`role_arn`	Used only in conjunction with the `credentials_file` setting, gives the ARN of an IAM Role to assume for authorizing AWS actions

Please note that, if credentials_file is not used, profile can be used to identify a standardly configured AWS profile that uses an assumed IAM Role, which is why the role_arn setting only works if a credentials_file is given.

`conf` File: `[s3]` Section

This section configures usage of S3. The available settings are:

Setting	Value Description
`bucket`	Explicitly sets the name of the bucket into which the backup should be made; if not given (or explicitly set on the `Saver` before `load_config` is called), CloudFormation is used to identify the bucket

`conf` File: `[cloudformation]` Section

When an S3 bucket is not explicitly configured either on the Saver object or through the bucket setting in the [s3] section, the Saver will attempt to query CloudFormation for an export – named backup-bucket, unless otherwise specified as indicated below – giving the name of the destination bucket.

Setting	Value Description
`region`	AWS region in which the CloudFormation stack to be queried resides; if the profile in use does not configure a default region, this is necessary for use of CloudFormation
`export`	The name of the export to read for the bucket name

`conf` File: `[content]` Section

Setting	Value Description
`lock_timeout`	The number of seconds to wait for an advisory read lock on a "picked" file
`days_retained`	The number of days for which content should be retained via S3's Object Lock

Backup Strategies

s3-sysbackup employs two different strategies for backup: a file-based strategy called quilting and a dynamic data strategy called snapshotting. Information from both strategies is included in the manifest file uploaded to S3 for each backup run on each machine.

Quilting

Quilting, like the Git version control system, is content based: it keys the objects in which it stores content on S3 by the content found within the file. The keys for these objects are then "patched together" (i.e. quilted) by the information this strategy incorporates into the manifest file uploaded for each backup. When a file's content hasn't changed since the last backup, that file content need not be uploaded again. And when multiple machines are backing up copies of the same file, the file content only needs to be uploaded by the first machine to back it up.

The manifest, like a recursive version of Git's trees, has entries referencing a file's content, path on the system, and permissions, but also its ownership. It also contains information about symbolic links and directories, so as much as possible of the backed-up system can be restored. Where possible, ownership uses symbolic names and permissions are captured as ACLs, but fallbacks to numeric representations can occur.

Selection of files for the quilting strategy is done by executables placed (or symlinked into) in the pickers.d subdirectory of the configuration directory. Each executable file in that directory is run and the lines it prints to STDOUT are interpreted as file paths; one typical usage is to put one or more executable shell scripts that call find with the desired parameters.

To save on transfer and storage costs with S3, each file selected by a picker will be evaluated as to whether gzipping will improve storage efficiency. If the test indicates that the file compresses well, the GZip algorithm will be applied before the file is transferred to S3; the application of GZip will be noted on the S3 object's metadata.

Snapshotting

Not all data to be backed up resides conveniently in a file that can be assumed unchanging or reasonably readable at a given moment, with databases as a good example. For these data, it is typically necessary to invoke a tool both to retrieve the backup data and also to restore it. s3-sysbackup accommodates this need through its snapshotting strategy.

Executable files in the snapshots.d subdirectory of the configuration directory provide both the snapshot content and commentary about the snapshot. Each of these executables will be invoked with a single argument: a decimal string giving the file descriptor to which it should write the snapshot data. Text sent to STDOUT is captured as a description of the snapshot included in the S3 object's metadata. A line sent to STDOUT starting with ::restore-through:: is interpreted as giving a (shell formatted) command for restoring the snapshot data (piped to STDIN) onto a system, which is additionally saved in a separate metadata entry.

With each run of the backup system, each snapshot will be taken and uploaded to S3. It is the responsibility of the snapshotter executables to implement any beneficial compression. The object keys of all uploaded snapshot content are collected within the uploaded manifest file for the backup run.

As a note, if writing a snapshotter executable as a bash script, the construct >&${1:-1} can be useful: it redirects output to the file descriptor passed as the first argument, defaulting to STDOUT (file descriptor 1) if not specified. This will not, of course, work within a function, though it could be adapted by copying the target descriptor number into a less general variable name:

# At top level of script
CONTENT_FD=${1:-1}

# Defining a function
function some_func() {
  # ...
  some-pipeline-producing-content >&$CONTENT_FD
  # ...
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Aug 12, 2021

This version

0.1.0

Mar 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3-sysbackup-0.1.0.tar.gz (35.2 kB view hashes)

Uploaded Mar 22, 2021 Source

Built Distribution

s3_sysbackup-0.1.0-py3-none-any.whl (55.3 kB view hashes)

Uploaded Mar 22, 2021 Python 3

Hashes for s3-sysbackup-0.1.0.tar.gz

Hashes for s3-sysbackup-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f2273b698e7c53fe8314ed59e24024f8551b9f97394d82fd4709a8bb9bc4ab28`
MD5	`cb06f07ca95aa40151cc495992dc7a16`
BLAKE2b-256	`613106ea4138ea128c6a427f7240d8e3bb125298efb3a8d1d4e0153fd0222283`

Hashes for s3_sysbackup-0.1.0-py3-none-any.whl

Hashes for s3_sysbackup-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f261491dfb404c1d199607034f10fed960e2c5815716608b969a7f262ede3a31`
MD5	`f0e97cffb1dd8796606cc6640192efd0`
BLAKE2b-256	`516e6ce84a1062d904808880198bff52b6e17297ce0e356343cd2465da1d305e`

s3-sysbackup 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

s3-sysbackup Package