A light, dezentralized provenance tracking framework using the W3C PROV-O vocabulary
Project description
PROVIT is a light, dezentralized provenance tracking framework. It allows the user to track workflows and modifications of data and files. A small subset of the W3C PROV-O vocabulary is implemented. Its aim is to provided an easy to use interface for users who have never worked with provenance tracking before. It you feel limited by PROVIT you should have a look at the more extensive implementation prov.
Full documentation is available under: provit.readthedocs.io.
Requirements
This Software was tested with Python 3.5 and 3.6.
Installation
Installation via pip is recommended for end users. We strongly encourage end users to make use of a virtualenv.
pip
Clone the repository and create a virtualenv.
$ git clone https://github.com/diggr/pit
$ mkvirtualenv provit
Install it with pip
$ pip install PATH_TO_PROVIT_REPOSITORY
git / Development
Clone the repository and create a virtualenv.
$ git clone https://github.com/diggr/pit
$ mkvirtualenv provit
Install it with pip in editable mode
$ pip install -e PATH_TO_PROVI_REPOSITORY
Usage
Provenance Integration Tools provide a command line client which can be used out of the box to enrich any file based data with provenance information. Furthermore the provenance class and vocabulary shipped with PIT can be used within other applications.
Command Line Client
Usage:
$ pit [OPTIONS] FILEPATH
Options:
- --add
Add provenance information layer to file
- -a AGENT, --agent AGENT
Provenance information: agent
- --activity ACTIVITY
Provenance information: activity
- -d DESCRIPTION, --desc DESCRIPTION
Provenance information: Description of the data manipulation process
- -o ORIGIN, --origin ORIGIN
Provenance information: Data origin
- -s SOURCES, --sources SOURCES
Provenance information: Source files
- -b, --browser
Provenance browser
- -n NAMESPACE, --namespace NAMESPACE
Provenance Namespace, default: http://provit.diggr.link/
- --help
Show this message and exit.
Provenance Class
from pit.prov import Provenance
# load prov data for a file, or create new prov for file
prov = Provenance(<filepath>)
# add provenance metadata
prov.add(agent="agent", activity="activity", description="...")
prov.add_primary_source("primary_source", url="http://...", comment="...")
prov.add_sources(["filepath1", "filepath2"])
# return provenance as json tree
prov_dict = prov.tree()
# save provenance metadata into "<filename>.prov" file
prov.save()
Roadmap
General Roadmap containing features we’d like to realize in the project
Add Persons to Agent, to allow more granular activity tracking
Feature Wishlist
A more detailed list of specific (smaller) features and functionality.
Notify user if source referenced file changes
Provenance files contain the version of a file if referenced, if an older version (i.e. not the current version) of a file is referenced, a warning should be displayed.
File Browser
A file browser showing e.g. all files with missing provenance.
Reference Clustering
Inspect files in your research folder, and display all references, to identify clusters. This could help structuring a messy research directory without breaking scripts, or at least knowing, which scripts possibly need to be updated.
FAQ / Paradigms
Can I add multiple agents to an activity?
No. The reason is: If you can distinguish the activities or impact of the agent, then you have multiple agents with multiple activities. E.g. if you let three students help you proofreading a file and you get back 1 revised version, then the three students are 1 agent as you cannot distinguish between their results. If you get back 3 versions you have 3 agents and 3 activities.
Overview
- Authors:
P. Mühleder muehleder@ub.uni-leipzig.de, F. Rämisch raemisch@ub.uni-leipzig.de
- License:
MIT
- Copyright:
2018, Peter Mühleder and Universitätsbibliothek Leipzig
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.