Skip to main content

Simple, file-based data version control

Project description

danvers

danvers: simple file-based data version control

License

Danvers is a Python data version management tool, which helps you to maintain and reference current and previous versions of data files.

This means you can ensure code and models run consistently when they reference datasets that may change and that you can ensure you keep older copies of data files used as part of decision making, so logic and code can be rerun.

New versions are only created if the new file is different to previous versions. This allows infrequently updated data sets to be easily maintained and versioned - if it's new, Danvers will keep a copy, if it hasn't changed, it will ignore and keep its current version.

Features

Danvers is simple, rather than feature-rich, but here are some of the things it can do:

  • Access previous versions of data files
  • Maintain a fixed number of verions (or all versions)
  • Different trimming strategies are available (first-in-first-out, last-used-first-out)
  • Automatic duplicate check for data files against all known versions

Dependencies

Danvers has no dependencies, just what comes with Python.

Example Usage

from danvers import Danvers

# instantiate with the location the data is stored
vers = Danvers(r'data')

# create the dataset if it doesn't exist already
if not 'marvel_movies' in vers.read_datasets():
    vers.create_dataset('marvel_movies')

# add the first data file, should return verion 1
version = vers.create_data_file('marvel_movies', r'test_data\movies_phase_1.csv')
print(version)

# adding a new data file should return version 2
version = vers.create_data_file('marvel_movies', r'test_data\movies_phase_1+2.csv')
print(version)

# get the filename for the latest version of the data
filename = vers.get_data_file('marvel_movies')
print(filename)

# get the filename for version 1 of the data
filename = vers.get_data_file('marvel_movies', 1)
print(filename)

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

danvers-1.0.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

danvers-1.0.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file danvers-1.0.0.tar.gz.

File metadata

  • Download URL: danvers-1.0.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for danvers-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0d1681f33b458ae4322201e8042da16bfd96ffdbe2d5624c6d1faf532b1a699f
MD5 3f1b0280cb8d01d8a59edb6424929610
BLAKE2b-256 a1a13b99b9452e16253c29923db48c8d028cc6be6e6f3d5ffd4616b50a90562b

See more details on using hashes here.

File details

Details for the file danvers-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: danvers-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for danvers-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cfd9f3433a15aa6d7bb279603d09f4c37b70b264d1c1c2ea50720659720f555
MD5 45af22c4ad366c018244c4f9e520d1d7
BLAKE2b-256 6a4d57ef7bd8767dfda4a811c5f350a9f42b1bf7b0518120b14ce57006d4a4b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page