Skip to main content

A light, dezentralized provenance tracking framework using the W3C PROV-O vocabulary

Project description

Python 3.6 GitHub license GitHub issues Docs passing

PROVIT is a light, dezentralized provenance tracking framework. It allows the user to track workflows and modifications of data and files. A small subset of the W3C PROV-O vocabulary is implemented. Its aim is to provided an easy to use interface for users who have never worked with provenance tracking before. It you feel limited by PROVIT you should have a look at the more extensive implementation prov.

Full documentation is available under:


This Software was tested with Python 3.5 and 3.6.


Installation via pip is recommended for end users. We strongly encourage end users to make use of a virtualenv.


Clone the repository and create a virtualenv.

$ git clone
$ mkvirtualenv provit

Install it with pip


git / Development

Clone the repository and create a virtualenv.

$ git clone
$ mkvirtualenv provit

Install it with pip in editable mode



Provenance Integration Tools provide a command line client which can be used out of the box to enrich any file based data with provenance information. Furthermore the provenance class and vocabulary shipped with PIT can be used within other applications.

Command Line Client




--add Add provenance information layer to file
-a AGENT, --agent AGENT
 Provenance information: agent
--activity ACTIVITY
 Provenance information: activity
 Provenance information: Description of the data manipulation process
-o ORIGIN, --origin ORIGIN
 Provenance information: Data origin
-s SOURCES, --sources SOURCES
 Provenance information: Source files
-b, --browser Provenance browser
 Provenance Namespace, default:
--help Show this message and exit.

Provenance Class

from pit.prov import Provenance

# load prov data for a file, or create new prov for file
prov = Provenance(<filepath>)

# add provenance metadata
prov.add(agent="agent", activity="activity", description="...")
prov.add_primary_source("primary_source", url="http://...", comment="...")
prov.add_sources(["filepath1", "filepath2"])

# return provenance as json tree
prov_dict = prov.tree()

# save provenance metadata into "<filename>.prov" file


General Roadmap containing features we’d like to realize in the project

  • Add Persons to Agent, to allow more granular activity tracking

Feature Wishlist

A more detailed list of specific (smaller) features and functionality.

Notify user if source referenced file changes

Provenance files contain the version of a file if referenced, if an older version (i.e. not the current version) of a file is referenced, a warning should be displayed.

File Browser

A file browser showing e.g. all files with missing provenance.

Reference Clustering

Inspect files in your research folder, and display all references, to identify clusters. This could help structuring a messy research directory without breaking scripts, or at least knowing, which scripts possibly need to be updated.

FAQ / Paradigms

Can I add multiple agents to an activity?

No. The reason is: If you can distinguish the activities or impact of the agent, then you have multiple agents with multiple activities. E.g. if you let three students help you proofreading a file and you get back 1 revised version, then the three students are 1 agent as you cannot distinguish between their results. If you get back 3 versions you have 3 agents and 3 activities.


Authors:P. Mühleder, F. Rämisch
Copyright:2018, Peter Mühleder and Universitätsbibliothek Leipzig

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
provit-0.2.2-py3-none-any.whl (15.0 kB) Copy SHA256 hash SHA256 Wheel py3
provit-0.2.2.tar.gz (12.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page