Skip to main content

hashget deduplication and compression tool

Project description

hashget

Deduplication tool for archiving (backup) debian virtual machines

For example, very useful for backup LXC containers before uploading to Amazon Glacier.

Installation

Pip (recommended):

pip3 install hashget

or clone from git:

git clone https://gitlab.com/yaroslaff/hashget.git

QuickStart

Create debian machine (optional). Later with this example we will use 'mydebvm' container in default LXC location.

lxc-create -n mydebvm -t download -- --dist=debian --release=stretch --arch=amd64

Update local and network hashdb with packages from this VM. (optional, but very recommended to get maximal efficiency)

hashget --debcrawl /var/lib/lxc/mydebvm/rootfs/ 

Now, main work, prepare

# bin/hashget -p /var/lib/lxc/mydebvm/rootfs/

Creates .hashget-restore file in rootfs and (by default) creates gethash-exclude file (for later tar command) in homedir of current user.

Tarring

# tar -czf /tmp/rootfs.tar.gz -X ~/gethash-exclude --exclude='var/lib/apt/lists' -C ~/delme/rootfs/ .

Effective tarring command, which excludes large directories (not needed for backup) and duplicate files

--exclude - files to exclude (relative to start of directory)

After this step, you have very small (just 29Mb for 300Mb+ generic debian 9 LXC machine rootfs)

Untarring

# tar -xzf rootfs.tar.gz -C rootfs

Just unpack to any directory as usual tar.gz file

root@braconnier:/tmp# du -sh rootfs/
80M	rootfs/

At this stage we have just 80 Mb out of 300+ Mb total.

Restoring

After unpacking, you can restore files to new rootfs

# hashget -u rootfs
recovered rootfs/usr/bin/vim.basic
recovered rootfs/lib/i386-linux-gnu/libdns-export.so.162.1.3
...
recovered rootfs/usr/share/doc/systemd/changelog.Debian.gz
recovered rootfs/usr/share/doc/systemd/copyright

Documentation

For more detailed documentation see Wiki.

Project details


Release history Release notifications | RSS feed

This version

0.120

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hashget-0.120.tar.gz (22.0 kB view details)

Uploaded Source

File details

Details for the file hashget-0.120.tar.gz.

File metadata

  • Download URL: hashget-0.120.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15rc1

File hashes

Hashes for hashget-0.120.tar.gz
Algorithm Hash digest
SHA256 ad0cfd33220b4bef9da866b633699e9a400176b89decee1b050b85bb4f1ba070
MD5 cc3f39ecd2ade94d3f399808829f83d4
BLAKE2b-256 e3bdd286a503949edbc4d2fa86772e42844a661884b9e6bd4b74ea2ea577d369

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page