Skip to main content

tool which de-duplicates files in a filesystem by checksum.

Project description

farmfs

Tool for creating / distributing / maintaining symlink farms.

Warning

FarmFS is still very early stage software.

Please do not keep anything in it which you are not willing to lose.

Installation

To use Farmfs

pip install git+https://github.com/andrewguy9/farmfs.git@master

To hack on Farmfs

git clone https://github.com/andrewguy9/farmfs.git
cd farmfs
python setup.py install

Usage:

FarmFS

Usage:
  farmfs mkfs
  farmfs (status|freeze|thaw) [<path>...]
  farmfs snap (make|list|read|delete|restore) <snap>
  farmfs fsck
  farmfs count
  farmfs similarity
  farmfs gc
  farmfs checksum <path>...
  farmfs remote add <remote> <root>
  farmfs remote remove <remote>
  farmfs remote list
  farmfs pull <remote> [<snap>]


Options:

What is FarmFS

Farmfs is a git style interface to non text, usually immutable, sometimes large files. It takes your files and puts them into an immutable blob store then builds symlinks from the file names into the store.

Why would you do that?

  • You can snapshot your directory structure BIG_O(num_files).
  • You can diff two different farmfs stores with BIG_O(num_files) rather than BIG_O(sum(file_sizes))
  • You can identify corruption of your files because all entries in the blob store are checksumed.
  • If the same file contents appear in multiple places you only have to put it in the blob store once. (deduplication)

Getting Started

Create a Farmfs store

mkdir myfarm
cd myfarm
farmfs mkfs

Make some files

mkdir -p 1/2/3/4/5
mkdir -p a/b/c/d/e
echo "value1" > 1/2/3/4/5/v1
echo "value1" > a/b/c/d/e/v1

Status can show us unmanged files.

farmfs status
/Users/andrewguy9/Downloads/readme/1/2/3/4/5/v1
/Users/andrewguy9/Downloads/readme/a/b/c/d/e/v1

Add the untracked files to the blob store. Notice it only needs to store "value1" once.

farmfs freeze
Processing /Users/andrewguy9/Downloads/readme/1/2/3/4/5/v1 with csum /Users/andrewguy9/Downloads/readme/.farmfs/userdata
Putting link at /Users/andrewguy9/Downloads/readme/.farmfs/userdata/238/851/a91/77b60af767ca431ed521e55
Processing /Users/andrewguy9/Downloads/readme/a/b/c/d/e/v1 with csum /Users/andrewguy9/Downloads/readme/.farmfs/userdata
Found a copy of file already in userdata, skipping copy

Edit a file. First we need to thaw it, then we can change it.

farmfs thaw 1/2/3/4/5/v1

farmfs status
/Users/andrewguy9/Downloads/readme/1/2/3/4/5/v1

echo "value2" > 1/2/3/4/5/v1

farmfs freeze 1/2/3/4/5/v1
Processing /Users/andrewguy9/Downloads/readme/1/2/3/4/5/v1 with csum /Users/andrewguy9/Downloads/readme/.farmfs/userdata
Putting link at /Users/andrewguy9/Downloads/readme/.farmfs/userdata/4ca/8c5/ae5/e759e237bfb80c51940de7a

farmfs status

We don't want to loose our progress, so lets make a snapshot.

farmfs snap make mysnap

Now create more stuff

echo "oops" > mistake.txt

farmfs freeze mistake.txt
Processing /Users/andrewguy9/Downloads/readme/mistake.txt with csum /Users/andrewguy9/Downloads/readme/.farmfs/userdata
Putting link at /Users/andrewguy9/Downloads/readme/.farmfs/userdata/38a/f5c/549/26b620264ab1501150cf189

Well that was a mistake, lets roll back to the old snap.

farmfs snap restore mysnap
Removing /mistake.txt

Now that we have our files built, lets build another depot.

cd ..
mkdir copy
cd copy
farmfs mkfs

We want to add our prior depot as a remote.

farmfs remote add origin ../myfarm

Now lets copy our work from before.

farmfs pull origin
mkdir /1
mkdir /1/2
mkdir /1/2/3
mkdir /1/2/3/4
mkdir /1/2/3/4/5
mklink /1/2/3/4/5/v1 -> /4ca/8c5/ae5/e759e237bfb80c51940de7a
Blob missing from local, copying
*** /Users/andrewguy9/Downloads/copy/.farmfs/userdata/4ca/8c5/ae5/e759e237bfb80c51940de7a /Users/andrewguy9/Downloads/myfarm/.farmfs/userdata/4ca/8c5/ae5/e759e237bfb80c51940de7a
mkdir /a
mkdir /a/b
mkdir /a/b/c
mkdir /a/b/c/d
mkdir /a/b/c/d/e
mklink /a/b/c/d/e/v1 -> /238/851/a91/77b60af767ca431ed521e55
Blob missing from local, copying
*** /Users/andrewguy9/Downloads/copy/.farmfs/userdata/238/851/a91/77b60af767ca431ed521e55 /Users/andrewguy9/Downloads/myfarm/.farmfs/userdata/238/851/a91/77b60af767ca431ed521e55

Lets see whats in our new depot:

find *
1
1/2
1/2/3
1/2/3/4
1/2/3/4/5
1/2/3/4/5/v1
a
a/b
a/b/c
a/b/c/d
a/b/c/d/e
a/b/c/d/e/v1

Development:

Testing:

Regression Testing:

Regression tests can be run with pytest Tests are kept in the tests directory, which will be detected by pytest automatically.

Performance Optimization:

Performance testing cases are stored under the perf directory. These are useful for making development decions are not generally useful as ongoing tests.

To run a particular trial run: pytest -s perf/your_test -k case_patter. Notice that the -s is required to get a printout of the results. Example: pytest -s perf/transducer.py -k transducers

Debugging

farmfs comes with a useful debugging tool farmdbg.

farmdbg
Usage:
  farmdbg reverse <csum>
  farmdbg key read <key>
  farmdbg key write <key> <value>
  farmdbg key delete <key>
  farmdbg key list [<key>]
  farmdbg walk (keys|userdata|root|snap <snapshot>)
  farmdbg checksum <path>...
  farmdbg fix link <file> <target>
  farmdbg rewrite-links <target>

farmdbg can be used to dump parts of the keystore or blobstore, as well as walk and repair links.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

farmfs-0.8.2.tar.gz (23.5 kB view details)

Uploaded Source

Built Distributions

farmfs-0.8.2-py3.9.egg (70.2 kB view details)

Uploaded Source

farmfs-0.8.2-py3.7.egg (67.0 kB view details)

Uploaded Source

farmfs-0.8.2-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

farmfs-0.8.2-py2.7.egg (60.0 kB view details)

Uploaded Source

File details

Details for the file farmfs-0.8.2.tar.gz.

File metadata

  • Download URL: farmfs-0.8.2.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.3

File hashes

Hashes for farmfs-0.8.2.tar.gz
Algorithm Hash digest
SHA256 5a0301aa58f37020f2f70bd7f0a521cd2c60aa7a1df751ebf6a95e98e298cebc
MD5 1f446f43e0294531d595e007d760a22a
BLAKE2b-256 279489523f2bedb23b40935aff85ee5e89f74c6c474fcf4f11ef59475af7f369

See more details on using hashes here.

File details

Details for the file farmfs-0.8.2-py3.9.egg.

File metadata

  • Download URL: farmfs-0.8.2-py3.9.egg
  • Upload date:
  • Size: 70.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.3

File hashes

Hashes for farmfs-0.8.2-py3.9.egg
Algorithm Hash digest
SHA256 b259084102e250e2317051553eff6bfd4d035796173fee6a9c2b867844a8a810
MD5 9cfcf58b16dc506691b1cbca0c2da34e
BLAKE2b-256 10b8253b5c8817396a8fb929f0e5aab9a556e3a9837174144c0568a4e2730000

See more details on using hashes here.

File details

Details for the file farmfs-0.8.2-py3.7.egg.

File metadata

  • Download URL: farmfs-0.8.2-py3.7.egg
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.3

File hashes

Hashes for farmfs-0.8.2-py3.7.egg
Algorithm Hash digest
SHA256 9ce29cb1255dd5516b09e79030ab3b34ca9f2d880392ec8011f94ff283ead85e
MD5 66ff89a6b3fca5c0b70bba072e8ef15d
BLAKE2b-256 891ba08c368862416b92540d34cd0cc74242bb31f27afa4c000768e9db572946

See more details on using hashes here.

File details

Details for the file farmfs-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: farmfs-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.3

File hashes

Hashes for farmfs-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 981c797a4ff1458bc96e35d7c1abd3316026c0fedfcb43e78ca2820212630e6e
MD5 b637a36576322c40c69cd6eccf4e7091
BLAKE2b-256 05ee714bbe74bde157b90eb96d6a62f2bd33160d7c6999b1c90ff4c581bf392c

See more details on using hashes here.

File details

Details for the file farmfs-0.8.2-py2.7.egg.

File metadata

  • Download URL: farmfs-0.8.2-py2.7.egg
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.7.3

File hashes

Hashes for farmfs-0.8.2-py2.7.egg
Algorithm Hash digest
SHA256 4cfb98c952dfcb86ce4f1bcac403d520f7645ce93b91bec53292618732c58b30
MD5 a5d55c3dad2f6ce884545435ca692127
BLAKE2b-256 07b02ab0e034862f53626940fa76dcae1a6965455098e639a0ac86e9fae6c6d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page