Skip to main content

HardLink/Deduplication Backups with Python

Project description

PyHardLinkBackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in filesystem:

    • accessible without any extra software or extra meta files

    • non-proprietary format

  • Create backups with versioning

    • every backup run creates a complete filesystem snapshot tree

    • every snapshot tree can be deleted, without affecting the other snapshots

  • Deduplication with hardlinks:

    • Store only changed files, all other via hardlinks

    • find duplicate files everywhere (even if renamed or moved files)

  • useable under Windows and Linux

current state:

  • python 3 only

  • Alpha state

Please, try, fork and contribute! ;)

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/PyHardLinkBackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Try out:

on Windows:

  1. install Python 3: https://www.python.org/downloads/

  2. Download the file boot_pyhardlinkbackup.cmd

  3. run boot_pyhardlinkbackup.cmd

There will be a virtual env in this path: %APPDATA%\PyHardLinkBackup

call these batch files:

  1. %APPDATA%\PyHardLinkBackup\pyhlb config.cmd

  2. %APPDATA%\PyHardLinkBackup\pyhlb migrate.cmd

on linux follow these steps:

1. Create a virtual env and install:

~$ virtualenv -p python3 PyHardLinkBackupEnv
$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -U pip
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -e git+https://github.com/jedie/PyHardLinkBackup.git#egg=PyHardLinkBackup

Note: If you not use python 3.5+, then ‘scandir’ will be installed and so you need the python3-dev package…

2. setup

create a .ini config file and edit it:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

Initialize the database:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb migrate

3. start a backup run

~$ ./PyHardLinkBackupEnv/bin/phlb backup ~/Photo

or:

~$ source ./PyHardLinkBackupEnv/bin/activate
(PyHardLinkBackupEnv) ~$ phlb backup ~/documents

configuration

phlb will used a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory

  2. user directory

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

run unittests

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb test

some notes

What is ‘phlb’ ?!?

the phlb executable is the similar to django manage.py, but it always used the PyHardLinkBackup settings.

Why in hell do you use django?!?

  • Well, just because of the great database ORM and the Admin Site ;)

How to go into the django admin?

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb runserver

And then just request ‘localhost’

TODO

  • copy file meta data like owner, mode etc.

  • handle symlinks

  • Quick Backup: Don’t check the content, just compare file size + modification date

  • use: https://github.com/jedie/bootstrap_env (So it’s better to install it under windows)

  • Add some helper files to start a backup (.sh / .cmd scripts)

  • write docs

  • write more tests

  • activate CI

  • Far future: Add a GUI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHardLinkBackup-0.1.4.tar.gz (17.8 kB view details)

Uploaded Source

Built Distributions

PyHardLinkBackup-0.1.4-py3.4.egg (19.7 kB view details)

Uploaded Source

PyHardLinkBackup-0.1.4-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file PyHardLinkBackup-0.1.4.tar.gz.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.4.tar.gz
Algorithm Hash digest
SHA256 148c785785fede548de783199ab738a19d8c8b15c400514b358bf6a52440632e
MD5 c57db506fc942fafb862602a66f27570
BLAKE2b-256 353f9713488180f71ecc8df9138ba8494b9fb4a45a79ffc16d69654f8814e6ce

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.4-py3.4.egg.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.4-py3.4.egg
Algorithm Hash digest
SHA256 9e3666b419ec41dd6e52d2c7e14a9ba32d959c05a57eab0ef2afc628522119a6
MD5 cf0a4c27fc3b66c9756934b0ae9f2dec
BLAKE2b-256 aed3cd095004dc2fb8101f0afd388e77364b2132b8f20b617a5900260addd262

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a9cb51a99b526e693c135a265bc7a9afeb5f2922ccce757ddf62feb25f6ac90c
MD5 935a1ba7c78ddb2e7dcc63aa7e0b933e
BLAKE2b-256 15590edb127385d02d9c185a886a50b967aa1c1fd1ec74f5ae6d8d4a19a31b40

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page