Skip to main content

HardLink/Deduplication Backups with Python

Project description

PyHardLinkBackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in filesystem:

    • accessible without any extra software or extra meta files

    • non-proprietary format

  • Create backups with versioning

    • every backup run creates a complete filesystem snapshot tree

    • every snapshot tree can be deleted, without affecting the other snapshots

  • Deduplication with hardlinks:

    • Store only changed files, all other via hardlinks

    • find duplicate files everywhere (even if renamed or moved files)

  • useable under Windows and Linux

current state:

  • python 3 only

  • Beta state

Please, try, fork and contribute! ;)

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/PyHardLinkBackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Try out:

on Windows:

  1. install Python 3: https://www.python.org/downloads/

  2. Download the file boot_pyhardlinkbackup.cmd

  3. run boot_pyhardlinkbackup.cmd

There will be a virtual env in this path: %APPDATA%\PyHardLinkBackup

call these batch files:

  1. %APPDATA%\PyHardLinkBackup\phlb_edit_config.cmd

  2. %APPDATA%\PyHardLinkBackup\phlb_migrate_database.cmd

There is also a helper batchfile:

  • %APPDATA%\PyHardLinkBackup\PyHardLinkBackup this directory.cmd

Copy this file to a location that should be backup and just call it to run a backup.

on linux follow these steps:

1. Create a virtual env and install:

~$ virtualenv -p python3 PyHardLinkBackupEnv
$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -U pip
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -e git+https://github.com/jedie/PyHardLinkBackup.git#egg=PyHardLinkBackup

Note: If you not use python 3.5+, then ‘scandir’ will be installed and so you need the python3-dev package…

2. setup

create a .ini config file and edit it:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

Initialize the database:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb migrate

3. start a backup run

~$ ./PyHardLinkBackupEnv/bin/phlb backup ~/Photo

or:

~$ source ./PyHardLinkBackupEnv/bin/activate
(PyHardLinkBackupEnv) ~$ phlb backup ~/documents

configuration

phlb will used a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory

  2. user directory

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

run unittests

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb test

some notes

What is ‘phlb’ ?!?

the phlb executable is the similar to django manage.py, but it always used the PyHardLinkBackup settings.

Why in hell do you use django?!?

  • Well, just because of the great database ORM and the Admin Site ;)

How to go into the django admin?

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb runserver

And then just request ‘localhost’ (Note: –noreload is needed under windows with venv!)

TODO

  • copy file meta data like owner, mode etc.

  • handle symlinks

  • Quick Backup: Don’t check the content, just compare file size + modification date

  • create boot_pyhardlinkbackup.sh script for linux

  • write docs

  • write more tests

  • activate CI

  • Far future: Add a GUI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHardLinkBackup-0.1.9.tar.gz (19.8 kB view details)

Uploaded Source

Built Distributions

PyHardLinkBackup-0.1.9-py3.4.egg (23.7 kB view details)

Uploaded Source

PyHardLinkBackup-0.1.9-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file PyHardLinkBackup-0.1.9.tar.gz.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a60cd94c65aaecdb5c3bb43efecdeb390c58a06954f28a0dd4f5517a62a42d8c
MD5 ab96d550292dd777674092172c1d5699
BLAKE2b-256 521bbd827f95d0716d5d85bca65794675d4c81c6ffd428b9211f4ed182f1750e

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.9-py3.4.egg.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.9-py3.4.egg
Algorithm Hash digest
SHA256 e565ebc31bd9d78b5d93bd99e664ca21a8d3275f1e98c2b7b30e375ad54321e9
MD5 cd589ca57b94aaef78e0d445e22f3fbc
BLAKE2b-256 1801e6369e9b253c999953e15b974b2ab4205eb089766641a11c09910e183ffe

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 5f98be283d622b2b6c28f79487363c043cef86c643b3221c9899307b884e42cf
MD5 92048001e4bf9d5cc7448a60de7fcece
BLAKE2b-256 45dd78d90a0a7db454520d6446630c132c69765262c2e29a183ca6bb5127037e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page