Skip to main content

A light, dezentralized provenance tracking framework using the W3C PROV-O vocabulary

Project description

Python 3 GitHub license GitHub issues Docs passing

provit is a data provenance annotation and documentation tool. It provides various feature for creation and retrieval of provenance information for data stored in files. The tracking of sources, modifications and merges allows the user to keep a log of all modifications a dataset was subject to. This is especially useful for dataset which are accessed intermittently or part of a long running workflow (e.g. for a scientific thesis). Furthermore, provenance data stored next to the data in an archive can help others to identify quality, value and acutality of the data.

provit does not require any external infrastructure. All information is stored in .prov files right next to the data files as a JSON-LD graph. This makes it the perfect tool for small teams or individual researchers.

To allow interoperatibility, a small subset of the W3C PROV-O vocabulary is implemented. Therefore, the provenance information can easily be merge in a linked data graph if necessary, at a later stage of the project.

provit aims to provided an easy to use interface for users who have never worked with provenance tracking before. You can operate the tool using the

If you feel limited by PROVIT you should have a look at more extensive implementations, e.g.: prov.

Full documentation is available under: provit.readthedocs.io.

assets/provit_promo.png

Quick Installation

provit is availabe via the Python Package Index (PyPI) and can be installed by using pip pip. Simply create a virtualenvironment with your preferred method a run the pip install command:

$ mkvirtualenv provit
$ pip install provit

Quickstart

provit provides three modes of interaction:

  • command line interface

  • graphical user interface

  • python package

All of them allow you to track provenance, but the provit browser additionally lets you explore tracked provenance.

provit browser

You can start provit browser directly from your terminal:

$ provit browser

provit cli

Simply cd to the directory, where your data is located, create (or append to an already existing) provenance file.

$ provit add FILEPATH [OPTIONS]

The –help command shows you the full list of available options and arguments.

$ provit --help

provit package

Using provit in your ETL pipeline is easy. simply import the Proveance class and start using it (e.g. as displayed below).

from provit import Provenance

# load prov data for a file, or create new prov for file
prov = Provenance(<filepath>)

# add provenance metadata
prov.add(agents=[ "agent" ], activity="activity", description="...")
prov.add_primary_source("primary_source")
prov.add_sources([ "filepath1", "filepath2" ])

# return provenance as json tree
prov_dict = prov.tree()

# save provenance metadata into "<filename>.prov" file
prov.save()

Roadmap

We have a small roadmap, which we will make transparent below:

  • Increase test coverage (currently 81%)

  • Windows support (all devs are on Linux)

  • Agent management in PROVIT Browser

Overview

Authors:

P. Mühleder muehleder@ub.uni-leipzig.de, F. Rämisch raemisch@ub.uni-leipzig.de

License:

MIT

Copyright:

2018-2019, Peter Mühleder and Universitätsbibliothek Leipzig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

provit-1.1.1.tar.gz (398.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

provit-1.1.1-py3-none-any.whl (410.1 kB view details)

Uploaded Python 3

File details

Details for the file provit-1.1.1.tar.gz.

File metadata

  • Download URL: provit-1.1.1.tar.gz
  • Upload date:
  • Size: 398.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.1

File hashes

Hashes for provit-1.1.1.tar.gz
Algorithm Hash digest
SHA256 c133f2502bebb856b92dbafffe1c7cfa4af79bbad3f380049ee3513e9a3c5bb3
MD5 9a9cffe011f6d164e21c6f2ef2c6dd81
BLAKE2b-256 f69a6f93a98067243fdc8c57e83e94c27b10bb2658c54c35b2392251fed6a392

See more details on using hashes here.

File details

Details for the file provit-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: provit-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 410.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.1

File hashes

Hashes for provit-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a0cbdf8bfcf48c760d024eb621aae41ed1e21fc52ecd27b3192efd8c23e9714
MD5 d73e8b7af6874321afcce10215dd4711
BLAKE2b-256 95bf89a01e1a915d61ba9345806ef70ce2581dfc32a284f5ceeaa0a0373e17b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page