Skip to main content

A small manager for versioned data

Project description

codecov pypi License PyPI - Downloads GitHub branch checks state

Flexible version control for files and folders.

Install

The simplest way is to get it from PyPi:

pip install bev

Cheatsheet

Adding new files

ls
# image.png ids.json some-folder
bev add image.png
ls
# image.png.hash ids.json some-folder
bev add ids.json some-folder
ls
# image.png.hash ids.json.hash some-folder.hash

git add image.png.hash ids.json.hash some-folder.hash
git commit -m "added new files"

Restoring the hashed files and folders

ls
# image.png.hash ids.json.hash some-folder.hash
bev pull image.png.hash --mode copy
ls
# image.png ids.json.hash some-folder.hash
bev pull some-folder.hash --mode copy
ls
# image.png ids.json.hash some-folder

Browsing a hashed folder

In this recipe we "expand" the hashed folder and fill it with the hashes of the files it contains. This is much faster than copying back the entire folder.

ls
# image.png.hash ids.json.hash some-folder.hash
bev pull some-folder.hash --mode hash
ls
# image.png.hash ids.json.hash some-folder
ls some-folder
# photo.jpg.hash some-text-file.txt.hash nested-folder

Afterwards you can add the folder back

bev add some-folder
ls
# image.png.hash ids.json.hash some-folder.hash

Getting started

  1. Choose a folder for your repository and create a basic config (.bev.yml):
main:
  storage: /path/to/storage/folder

meta:
  hash: sha256
  1. Run init
bev init
  1. Add files to bev
bev add /path/to/some/file.json
# also can provide several paths
bev add /path/to/some/folder/ /path/to/some/image.png
  1. ... and to git
git add file.json.hash folder.hash image.png.hash
git commit -m "added files"
  1. Access the files from python
import imageio
from bev import Repository

# `version` can be a commit hash or a git tag 
repo = Repository('/path/to/repo', version='8a7fe6')
image = imageio.imread(repo.resolve('image.png'))
  1. Or from cli
# replace the folder's hash by the hashes of its files
bev pull folder.hash --mode hash
# entirely restore the folder (inverse of `bev add folder`)
bev pull folder.hash --mode copy
# same for files
bev pull image.png.hash --mode copy

Advanced usage

Here are some tutorials that cover more advanced configuration, including multiple storage locations and machines:

  1. Create a repository - needed only at first time setup
  2. Adding files
  3. Accessing files

Why not DVC?

DVC is a great project, and we took inspiration from it while designing bev. However, out lab has several requirements that DVC doesn't meet:

  1. Our data caches are spread across multiple HDDs - we need support for multiple cache locations
  2. We have multiple machines, and each of them has a different storage configuration: locations, number of HDDs, their volumes - we need a flexible way of choosing the right config depending on the machine
  3. Often we simultaneously conduct experiments on different versions of the same data - we need easy access to multiple version of the same data
  4. The need for dvc checkout after git checkout is error-prone, because it can lead to situations when the data is not consistent with the current commit - we need a more constrained relation between data and git

bev supports all four out of the box!

However, if these requirements are not essential to your project, you may want to stick with DVC - its community and tests coverage is much larger.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bev-0.12.0.tar.gz (40.7 kB view details)

Uploaded Source

File details

Details for the file bev-0.12.0.tar.gz.

File metadata

  • Download URL: bev-0.12.0.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for bev-0.12.0.tar.gz
Algorithm Hash digest
SHA256 9be851373b638e50244e0a9467bee5917c6dced4d97b90913420de4e4f2d1768
MD5 43cf188faf1db07ed54f29098ee530c0
BLAKE2b-256 7ae8cfc59590e4dae4067d13c95acffa8a8e8543c5f7355c4ce3384e09a64830

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page