Skip to main content

HardLink/Deduplication Backups with Python

Project description

pyhardlinkbackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in the filesystem:

    • accessible without any extra software or extra meta files

    • non-proprietary format

  • Create backups with versioning

    • every backup run creates a complete filesystem snapshot tree

    • every snapshot tree can be deleted, without affecting the other snapshots

  • Deduplication with hardlinks:

    • Store only changed files, all other via hardlinks

    • find duplicate files everywhere (even if renamed or moved files)

  • useable under Windows and Linux

Requirement: Python 3.6 or newer.

Please: try, fork and contribute! ;)

Build Status on github

github.com/jedie/pyhardlinkbackup/actions

Build Status on travis-ci.org

travis-ci.org/jedie/pyhardlinkbackup

Build Status on appveyor.com

ci.appveyor.com/project/jedie/pyhardlinkbackup

Coverage Status on coveralls.io

coveralls.io/r/jedie/pyhardlinkbackup

Requirements Status on requires.io

requires.io/github/jedie/pyhardlinkbackup/requirements/

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/pyhardlinkbackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── phlb_config.ini
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── phlb_config.ini
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Installation

Windows

  1. install Python 3: https://www.python.org/downloads/

  2. Download the file boot_pyhardlinkbackup.cmd

  3. call boot_pyhardlinkbackup.cmd as admin (Right-click and use Run as administrator)

If everything works fine, you will get a venv here: %ProgramFiles%\PyHardLinkBackup

After the venv is created, call these scripts to finalize the setup:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_edit_config.cmd - create a config .ini file

  2. %ProgramFiles%\PyHardLinkBackup\phlb_migrate_database.cmd - create database tables

To upgrade pyhardlinkbackup, call:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_upgrade_pyhardlinkbackup.cmd

To start the Django webserver, call:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_run_django_webserver.cmd

Linux

  1. Download the file boot_pyhardlinkbackup.sh

  2. call boot_pyhardlinkbackup.sh

If everything works fine, you will get a venv here: ~\pyhardlinkbackup

After the venv is created, call these scripts to finalize the setup:

  • ~/PyHardLinkBackup/phlb_edit_config.sh - create a config .ini file

  • ~/PyHardLinkBackup/phlb_migrate_database.sh - create database tables

To upgrade pyhardlinkbackup, call:

  • ~/PyHardLinkBackup/phlb_upgrade_pyhardlinkbackup.sh

To start the Django webserver, call:

  • ~/PyHardLinkBackup/phlb_run_django_webserver.sh

Starting a backup run

To start a backup run, use this helper script:

  • Windows batch: %ProgramFiles%\PyHardLinkBackup\pyhardlinkbackup_this_directory.cmd

  • Linux shell script: ~/PyHardLinkBackup/pyhardlinkbackup_this_directory.sh

Copy this file to a location that should be backed up and just call it to run a backup.

Verifying an existing backup

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb verify --fast ~/PyHardLinkBackups/documents/2016-01-07-102310

With –fast the files’ contents will not be checked. If not given: The hashes from the files’ contents will be calculated and compared. Thus, every file must be completely read from filesystem, so it will take some time.

A verify run does:

  • Do all files in the backup exist?

  • Compare file sizes

  • Compare hashes from hash-file

  • Compare files’ modification timestamps

  • Calculate hashes from files’ contents and compare them (will be skipped if –fast used)

Configuration

phlb will use a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory down to root

  2. user directory

E.g. if the current working directoy is /foo/bar/my_files/ then the search path will be:

  • /foo/bar/my_files/PyHardLinkBackup.ini

  • /foo/bar/PyHardLinkBackup.ini

  • /foo/PyHardLinkBackup.ini

  • /PyHardLinkBackup.ini

  • ~/PyHardLinkBackup.ini The user home directory under Windows/Linux

Create / edit default .ini

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb config

The defaults are stored here: /phlb/config_defaults.ini

Excluding files/folders from backup:

There are two ways to exclude files/folders from your backup. Use the follow settings in your PyHardLinkBackup.ini

# Directory names that will be recursively excluded from backups (comma separated list!)
SKIP_DIRS= __pycache__, temp

# glob-style patterns to exclude files/folders from backups (used with Path.match(), Comma separated list!)
SKIP_PATTERNS= *.pyc, *.tmp, *.cache

The filesystem scan is divided into two steps: 1. Just scan the filesystem tree 2. Filter and load meta data for every directory item

The SKIP_DIRS is used in the first step. The SKIP_PATTERNS is used the the second step.

Upgrading pyhardlinkbackup

To upgrade to a new version just start this helper script:

Some notes

What is ‘phlb’ and ‘manage’ ?!?

phlb is a CLI.

manage is similar to a normal Django manage.py, but it always uses the pyhardlinkbackup settings.

Why in hell do you use Django?!?

  • Well, just because of the great database ORM and the Admin Site. ;)

How to go into the Django admin?

Just start:

  • Windows: phlb_run_django_webserver.cmd

  • Linux: phlb_run_django_webserver.sh

And then request ‘localhost’ (Note: –noreload is needed for Windows with venv!)

Running the unit tests

Just start: phlb_run_tests.cmd / phlb_run_tests.sh or do this:

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ manage test

Using the CLI

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb --help
Usage: phlb [OPTIONS] COMMAND [ARGS]...

  pyhardlinkbackup

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  add     Scan all existing backup and add missing ones...
  backup  Start a Backup run
  config  Create/edit .ini config file
  helper  link helper files to given path
  verify  Verify a existing backup

Add missing backups to the database

phlb add can be used in different scenarios:

  • recreate the database

  • add a backup manually

phlb add does this:

  • scan the complete file tree under BACKUP_PATH (default: ~/PyHardLinkBackups)

  • recreate all hash files

  • add all files to database

  • deduplicate with hardlinks, if possible

So it’s possible to recreate the complete database:

  • delete the current .sqlite file

  • run phlb add to recreate

Another scenario is e.g.:

  • DSLR images are stored on a network drive.

  • You have already a copy of all files locally.

  • You would like to add the local copy to pyhardlinkbackup.

Do the following steps:

  • move the local files to a subdirectory below BACKUP_PATH

  • e.g.: ~/PyHardLinkBackups/pictures/2015-12-29-000015/

  • Note: the date format in the subdirectory name must match the SUB_DIR_FORMATTER in your config

  • run: phlb add

Now you can run phlb backup from your network drive to make a new, up-to-date backup.

Windows Development

Some notes about setting up a development environment on Windows: /dev/WindowsDevelopment.creole

Alternative solutions

See also: https://github.com/restic/others#list-of-backup-software

History

Donating


Note: this file is generated from README.creole 2020-03-06 09:14:46 with "python-creole"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhardlinkbackup-0.12.2.tar.gz (64.2 kB view details)

Uploaded Source

Built Distribution

pyhardlinkbackup-0.12.2-py3-none-any.whl (79.5 kB view details)

Uploaded Python 3

File details

Details for the file pyhardlinkbackup-0.12.2.tar.gz.

File metadata

  • Download URL: pyhardlinkbackup-0.12.2.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9

File hashes

Hashes for pyhardlinkbackup-0.12.2.tar.gz
Algorithm Hash digest
SHA256 ca35f01cfa7e439b06d8a7133aba9df83a59c328335a5638ad95013cb8a79f71
MD5 0b85319835bab61915d6af8e1f0f4639
BLAKE2b-256 9923911a0f78bf02dc00aff2c50b97f3f4c3c8c6d4947cadf58265bec7434edd

See more details on using hashes here.

File details

Details for the file pyhardlinkbackup-0.12.2-py3-none-any.whl.

File metadata

  • Download URL: pyhardlinkbackup-0.12.2-py3-none-any.whl
  • Upload date:
  • Size: 79.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.9

File hashes

Hashes for pyhardlinkbackup-0.12.2-py3-none-any.whl
Algorithm Hash digest
SHA256 941baff949eba1100c759f0765789ddb7988505414d7812ce080b23500430f92
MD5 0e71db953180897f0807d795a59b8b51
BLAKE2b-256 7c4c885ce18ca2675ff50edebcffff9242b9f2c633141058894e3439d59262f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page