Skip to main content

hashget deduplication and compression tool

Project description

hashget

Deduplication tool for archiving (backup) debian virtual machines

For example, very useful for backup LXC containers before uploading to Amazon Glacier.

Installation

Pip (recommended):

pip3 install hashget

or clone from git:

git clone https://gitlab.com/yaroslaff/hashget.git

QuickStart

You can test on any VM you have. We create debian LXC machine for. Later with this example we will use 'mydebvm' container in default LXC location.

host# lxc-create -n mydebvm -t download -- --dist=debian --release=stretch --arch=amd64
host# lxc-attach -n mydebvm
mydebvm# apt install wget apache2 mysql-server vim

Now, the magic: create .tar.gz without files which could be downloaded from Internet.

# hashget -zf /tmp/mydebvm.tar.gz --pack /var/lib/lxc/mydebvm/rootfs/ --exclude var/cache/apt var/lib/apt/lists 

Now lets compare results with usual tarring


# du -sh /var/lib/lxc/mydebvm/rootfs/
321M	/var/lib/lxc/mydebvm/rootfs/

# tar -czf /tmp/mydebvm-orig.tar.gz --exclude='var/lib/apt/lists' -C /var/lib/lxc/mydebvm/rootfs .

# ls -lh /tmp/mydebvm.tar.gz /tmp/mydebvm-orig.tar.gz 
-rw-r--r-- 1 root root 99M Mar  4 22:01 /tmp/mydebvm-orig.tar.gz
-rw-r--r-- 1 root root 29M Mar  4 21:59 /tmp/mydebvm.tar.gz

Optimized backup is 70Mb shorter, just 29 instead of 99, 70% saved! If you pay for storing your backups (e.g. on Amazon Glacier, you now will pay just $29 where before you paid $99).

After this step, you have very small (just 29Mb for 300Mb+ generic debian 9 LXC machine rootfs)

Untarring:

# tar -xzf mydebvm.tar.gz -C rootfs

Just unpack to any directory as usual tar.gz file

# du -sh rootfs/
80M	rootfs/

At this stage we have just 80 Mb out of 300+ Mb total.

Restoring

After unpacking, you can restore files to new rootfs

# hashget -u rootfs
recovered rootfs/usr/bin/vim.basic
recovered rootfs/lib/i386-linux-gnu/libdns-export.so.162.1.3
...
recovered rootfs/usr/share/doc/systemd/changelog.Debian.gz
recovered rootfs/usr/share/doc/systemd/copyright

Documentation

For more detailed documentation see Wiki.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashget-0.133.tar.gz (24.2 kB view details)

Uploaded Source

File details

Details for the file hashget-0.133.tar.gz.

File metadata

  • Download URL: hashget-0.133.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.6.7

File hashes

Hashes for hashget-0.133.tar.gz
Algorithm Hash digest
SHA256 b1afc313dffefaae4842ea1f4bcfbb6d1ee891e07727fa344e6c0517d1e7d500
MD5 1b491eadf218b17390643551f231596d
BLAKE2b-256 dc5d67c84add538d4b125ac584a477f34c7d7645d3db1001d34ccca9c6bb457f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page