Skip to main content

HardLink/Deduplication Backups with Python

Project description

PyHardLinkBackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in filesystem:

    • accessible without any extra software or extra meta files

    • non-proprietary format

  • Create backups with versioning

    • every backup run creates a complete filesystem snapshot tree

    • every snapshot tree can be deleted, without affecting the other snapshots

  • Deduplication with hardlinks:

    • Store only changed files, all other via hardlinks

    • find duplicate files everywhere (even if renamed or moved files)

  • useable under Windows and Linux

current state:

  • python 3.4 or newer only

  • Beta state

Please, try, fork and contribute! ;)

Build Status on travis-ci.org

travis-ci.org/jedie/PyHardLinkBackup

Build Status on appveyor.com

ci.appveyor.com/project/jedie/pyhardlinkbackup

Coverage Status on coveralls.io

coveralls.io/r/jedie/PyHardLinkBackup

Requirements Status on requires.io

requires.io/github/jedie/PyHardLinkBackup/requirements/

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/PyHardLinkBackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── phlb_config.ini
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── phlb_config.ini
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Install

Windows

  1. install Python 3: https://www.python.org/downloads/

  2. Download the file boot_pyhardlinkbackup.cmd

  3. run boot_pyhardlinkbackup.cmd

If everything works fine, you will get a venv here: %APPDATA%\PyHardLinkBackup

After the venv is created, call these scripts to finilize the setup:

  1. %APPDATA%\PyHardLinkBackup\phlb_edit_config.cmd - Created a config .ini file

  2. %APPDATA%\PyHardLinkBackup\phlb_migrate_database.cmd - Create Database tables

To upgrade PyHardLinkBackup, call:

  1. %APPDATA%\PyHardLinkBackup\phlb_upgrade_PyHardLinkBackup.cmd

To start the django webserver, call:

  1. %APPDATA%\PyHardLinkBackup\phlb_run_django_webserver.cmd

Linux

  1. Download the file boot_pyhardlinkbackup.sh

  2. call boot_pyhardlinkbackup.sh

Note: If you not use python 3.5+, then you must install ‘scandir’, e.g.:

~ $ cd PyHardLinkBackup
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ pip install scndir

(You need the python3-dev package installed)

If everything works fine, you will get a venv here: ~\PyHardLinkBackup

After the venv is created, call these scripts to finilize the setup:

  • ~/PyHardLinkBackup/phlb_edit_config.sh - Created a config .ini file

  • ~/PyHardLinkBackup/phlb_migrate_database.sh - Create Database tables

To upgrade PyHardLinkBackup, call:

  • ~/PyHardLinkBackup/phlb_upgrade_PyHardLinkBackup.sh

To start the django webserver, call:

  • ~/PyHardLinkBackup/phlb_run_django_webserver.sh

start backup run

To start a backup run, use this helper script:

  • Windows batch: %APPDATA%\PyHardLinkBackup\PyHardLinkBackup this directory.cmd

  • Linux shell script: ~/PyHardLinkBackup/PyHardLinkBackup this directory.sh

Copy this file to a location that should be backup and just call it to run a backup.

Verify a existing backup

$ cd PyHardLinkBackup/
~/PyHardLinkBackup $ source bin/activate

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb verify --fast ~/PyHardLinkBackups/documents/2016-01-07-102310

With –fast the file content will not be checkt. If not given: The hash from the file content will be calculated and compared. So every file must be complete read from filesystem, so it’s takes some time.

A verify run do:

  • Exist all file in backup?

  • Compare file size

  • Compare hash from hash-file

  • Compare file modify timestamp

  • Calculate hash from file content and compare (Will be skipped if –fast used)

configuration

phlb will used a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory down to root

  2. user directory

e.g.: Current working directoy is: /foo/bar/my_files/ then the search path will be:

  • /foo/bar/my_files/PyHardLinkBackup.ini

  • /foo/bar/PyHardLinkBackup.ini

  • /foo/PyHardLinkBackup.ini

  • /PyHardLinkBackup.ini

  • /PyHardLinkBackup.ini The user home directory under Windows/Linix

Create / edit default .ini

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb config

The defaults are stored here: /phlb/config_defaults.ini

Exclude files/folders from backup:

There are two ways to exclude files/folders from your backup. Use the follow settings in your PyHardLinkBackup.ini

# Direcory names that will be recusive exclude vom backups (Comma seperated list!)
SKIP_DIRS= __pycache__, temp

# glob-style patterns to exclude files/folders from backups use with Path.match() (Comma seperated list!)
SKIP_PATTERNS= *.pyc, *.tmp, *.cache

The filesystem scan is divided into two steps: 1. Just can the filesystem tree 2. Filter and load meta data for every directory item

The SKIP_DIRS is used in the first step. The SKIP_PATTERNS is used the the second step.

upgrate PyHardLinkBackup

To upgrate to a new version just start these helper script:

some notes

What is ‘phlb’ and ‘manage’ ?!?

The phlb executable cli.

The manage is similar to a normal django manage.py, but it always used the PyHardLinkBackup settings.

Why in hell do you use django?!?

  • Well, just because of the great database ORM and the Admin Site ;)

How to go into the django admin?

Just start:

  • windows: phlb_run_django_webserver.cmd

  • linux: phlb_run_django_webserver.sh

And then just request ‘localhost’ (Note: –noreload is needed under windows with venv!)

run unittests

Just start: phlb_run_tests.cmd / phlb_run_tests.sh or do this:

$ cd PyHardLinkBackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ manage test

the cli

$ cd PyHardLinkBackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb --help
Usage: phlb [OPTIONS] COMMAND [ARGS]...

  PyHardLinkBackup

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  backup  Start a Backup run
  config  Create/edit .ini config file
  helper  link helper files to given path
  verify  Verify a existing backup

Windows Development

Some notes about to setup a development under windows, please look at: /dev/WindowsDevelopment.creole

alternative solutions

History

  • 03.02.2016 - v0.7.0 - compare v0.6.4…v0.7.0

    • New: verify a existing backup

    • IMPORTANT: run database migration is needed!

  • 01.02.2016 - v0.6.4 - compare v0.6.2…v0.6.4

    • Windows: Bugfix temp rename error, because of the Windows API limitation, see: #13

    • Linux: Bugfix scanner if symlink is broken

    • Display local variables on low level errors

  • 29.01.2016 - v0.6.3 - compare v0.6.2…v0.6.3

    • Less verbose and better information about SKIP_DIRS/SKIP_PATTERNS hits

  • 28.01.2016 - v0.6.2 - compare v0.6.1…v0.6.2

    • Handle unexpected errors and continue backup with the next file

    • Better handle interrupt key during execution

  • 28.01.2016 - v0.6.1 - compare v0.6.0…v0.6.1

    • Bugfix #13 by using a better temp rename routine

  • 28.01.2016 - v0.6.0 - compare v0.5.1…v0.6.0

    • New: faster backup by compare mtime/size only if old backup files exists

  • 27.01.2016 - v0.5.1 - compare v0.5.0…v0.5.1

    • IMPORTANT: run database migration is needed!

    • New .ini setting: LANGUAGE_CODE for change translation

    • mark if backup was finished compled

    • Display information of last backup run

    • Add more information into summary file

  • 27.01.2016 - v0.5.0 - compare v0.4.2…v0.5.0

    • refactory source tree scan. Split in two passed.

    • CHANGE SKIP_FILES in .ini config to: SKIP_PATTERNS

    • Backup from newest files to oldest files.

    • Fix #10:

      • New –name cli option (optional) to force a backup name.

      • Display error message if backup name can be found (e.g.: backup a root folder)

  • 22.01.2016 - v0.4.2 - compare v0.4.1…v0.4.2

  • 22.01.2016 - v0.4.1 - compare v0.4.0…v0.4.1

    • Skip files that can’t be read/write. (and try to backup the remaining files)

  • 21.01.2016 - v0.4.0 - compare v0.3.1…v0.4.0

    • Search for PyHardLinkBackup.ini file in every parent directory from the current working dir

    • increase default chunk size to 20MB

    • save summary and log file for every backup run

  • 15.01.2016 - v0.3.1 - compare v0.3.0…v0.3.1

    • fix unittest run under windows

  • 15.01.2016 - v0.3.0 - compare v0.2.0…v0.3.0

    • database migration needed

    • Add ‘no_link_source’ to database (e.g. Skip source, if 1024 links created under windows)

  • 14.01.2016 - v0.2.0 - compare v0.1.8…v0.2.0

    • good unittests coverage that covers the backup process

  • 08.01.2016 - v0.1.8 - compare v0.1.0alpha0…v0.1.8

    • install and runable under Windows

  • 06.01.2016 - v0.1.0alpha0 - d42a5c5

    • first Release on PyPi

  • 29.12.2015 - commit 2ce43

    • commit ‘Proof of concept’

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHardLinkBackup-0.7.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distributions

PyHardLinkBackup-0.7.0-py3.4.egg (57.9 kB view details)

Uploaded Source

PyHardLinkBackup-0.7.0-py3-none-any.whl (65.3 kB view details)

Uploaded Python 3

File details

Details for the file PyHardLinkBackup-0.7.0.tar.gz.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.7.0.tar.gz
Algorithm Hash digest
SHA256 c6cc8bd9acd1d1591348aba9c30d56149c983c1015490f0c02cab5cdbd0010bd
MD5 cfaff4d28780ccc4f5649058a7139fbd
BLAKE2b-256 9d5de87dbd08edc8eac561f73ae7874fd4dff8b4f316fe1e867f932d22fece44

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.7.0-py3.4.egg.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.7.0-py3.4.egg
Algorithm Hash digest
SHA256 ab159a5f639e4b88b71ae08e760ad70b98e42c23bdcff9266065eb2eba17dc7f
MD5 07dadd9d686c20832c67f0a870d4ec2c
BLAKE2b-256 0ac91fc8aab27b6f8dd0a2ee2a0b9c113e17bc9d0a0c0fe088fdb0c4b691ec17

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 755e42ecd15a54b1cbc7ded16eb71a17fb4a7de738001a90acea81cc836dd977
MD5 18e9f51954597829efdcb748b8bbf2a6
BLAKE2b-256 1bb8554a5772b83e58fe98760ab4f40d52684d1976544e089b2f6bf80ffc702b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page