Skip to main content

HardLink/Deduplication Backups with Python

Project description

PyHardLinkBackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in filesystem:

    • accessible without any extra software or extra meta files

    • non-proprietary format

  • Create backups with versioning

    • every backup run creates a complete filesystem snapshot tree

    • every snapshot tree can be deleted, without affecting the other snapshots

  • Deduplication with hardlinks:

    • Store only changed files, all other via hardlinks

    • find duplicate files everywhere (even if renamed or moved files)

  • useable under Windows and Linux

current state:

  • python 3 only

  • Alpha state

Please, try, fork and contribute! ;)

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/PyHardLinkBackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Try out:

1. Create a virtual env and install:

~$ virtualenv -p python3 PyHardLinkBackupEnv
$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -U pip
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ pip install -e git+https://github.com/jedie/PyHardLinkBackup.git#egg=PyHardLinkBackup

Note: If you not use python 3.5+, then ‘scandir’ will be installed and so you need the python3-dev package…

2. setup

create a .ini config file and edit it:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

Initialize the database:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb migrate

3. start a backup run

~$ ./PyHardLinkBackupEnv/bin/phlb backup ~/Photo

or:

~$ source ./PyHardLinkBackupEnv/bin/activate
(PyHardLinkBackupEnv) ~$ phlb backup ~/documents

configuration

phlb will used a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory

  2. user directory

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb config

run unittests

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb test

some notes

What is ‘phlb’ ?!?

the phlb executable is the similar to django manage.py, but it always used the PyHardLinkBackup settings.

Why in hell do you use django?!?

  • Well, just because of the great database ORM and the Admin Site ;)

How to go into the django admin?

$ cd PyHardLinkBackupEnv/
~/PyHardLinkBackupEnv $ source bin/activate
(PyHardLinkBackupEnv) ~/PyHardLinkBackupEnv $ phlb runserver

And then just request ‘localhost’

TODO

  • copy file meta data like owner, mode etc.

  • handle symlinks

  • Quick Backup: Don’t check the content, just compare file size + modification date

  • use: https://github.com/jedie/bootstrap_env (So it’s better to install it under windows)

  • Add some helper files to start a backup (.sh / .cmd scripts)

  • write docs

  • write more tests

  • activate CI

  • Far future: Add a GUI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHardLinkBackup-0.1.1.tar.gz (16.0 kB view details)

Uploaded Source

Built Distributions

PyHardLinkBackup-0.1.1-py3.4.egg (19.5 kB view details)

Uploaded Source

PyHardLinkBackup-0.1.1-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file PyHardLinkBackup-0.1.1.tar.gz.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f3a60f3ecac81456b2b3cd68d0db9170e2b636c387774ca01280ddce3dcecb08
MD5 c8ecb0bb4afbe0a7abcd77e7a66105c9
BLAKE2b-256 329c1ffcd548d2066e259972f25c51c1b3f233cde5e9ff4f5f715fa0f6069750

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.1-py3.4.egg.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.1-py3.4.egg
Algorithm Hash digest
SHA256 34ef445b1fc26374708d2513a2b8a73a7f5fc773231e45a23cb8d578e1a95689
MD5 6bc139e6f0ffd7e5d614351870a4e4ec
BLAKE2b-256 48c7b0ec8d4a1b5f7d55d2bfd5dba8371455a8ed4ec707044d0793d903a29df0

See more details on using hashes here.

File details

Details for the file PyHardLinkBackup-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for PyHardLinkBackup-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81ba893136d3febea423e0b7d6d67d1eaa49aa67bbc42c6ade62d8cd1ca3295f
MD5 410e6a018fded8f469a32d69bc742328
BLAKE2b-256 1f195029c17c8f147149d27892e4a180afd90b6a16b282285ef14cdfc6535bb9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page