Skip to main content

A light, dezentralized provenance tracking framework using the W3C PROV-O vocabulary

Project description

Python 3.6 GitHub license GitHub issues Docs passing

PROVIT is a light, dezentralized provenance tracking framework. It allows the user to track workflows and modifications of data and files. A small subset of the W3C PROV-O vocabulary is implemented. Its aim is to provided an easy to use interface for users who have never worked with provenance tracking before. It you feel limited by PROVIT you should have a look at the more extensive implementation prov.

Full documentation is available under: provit.readthedocs.io.

Requirements

This Software was tested with Python 3.5 and 3.6.

Installation

Installation via pip is recommended for end users. We strongly encourage end users to make use of a virtualenv.

pip

Clone the repository and create a virtualenv.

$ git clone https://github.com/diggr/pit
$ mkvirtualenv provit

Install it with pip

$ pip install PATH_TO_PROVIT_REPOSITORY

git / Development

Clone the repository and create a virtualenv.

$ git clone https://github.com/diggr/pit
$ mkvirtualenv provit

Install it with pip in editable mode

$ pip install -e PATH_TO_PROVI_REPOSITORY

Usage

Provenance Integration Tools provide a command line client which can be used out of the box to enrich any file based data with provenance information. Furthermore the provenance class and vocabulary shipped with PIT can be used within other applications.

Command Line Client

Usage:

$ pit [OPTIONS] FILEPATH

Options:

--add Add provenance information layer to file
-a AGENT, --agent AGENT
 Provenance information: agent
--activity ACTIVITY
 Provenance information: activity
-d DESCRIPTION, --desc DESCRIPTION
 Provenance information: Description of the data manipulation process
-o ORIGIN, --origin ORIGIN
 Provenance information: Data origin
-s SOURCES, --sources SOURCES
 Provenance information: Source files
-b, --browser Provenance browser
-n NAMESPACE, --namespace NAMESPACE
 Provenance Namespace, default: http://provit.diggr.link/
--help Show this message and exit.

Provenance Class

from pit.prov import Provenance

# load prov data for a file, or create new prov for file
prov = Provenance(<filepath>)

# add provenance metadata
prov.add(agent="agent", activity="activity", description="...")
prov.add_primary_source("primary_source", url="http://...", comment="...")
prov.add_sources(["filepath1", "filepath2"])

# return provenance as json tree
prov_dict = prov.tree()

# save provenance metadata into "<filename>.prov" file
prov.save()

Roadmap

General Roadmap containing features we’d like to realize in the project

  • Add Persons to Agent, to allow more granular activity tracking

Feature Wishlist

A more detailed list of specific (smaller) features and functionality.

Notify user if source referenced file changes

Provenance files contain the version of a file if referenced, if an older version (i.e. not the current version) of a file is referenced, a warning should be displayed.

File Browser

A file browser showing e.g. all files with missing provenance.

Reference Clustering

Inspect files in your research folder, and display all references, to identify clusters. This could help structuring a messy research directory without breaking scripts, or at least knowing, which scripts possibly need to be updated.

FAQ / Paradigms

Can I add multiple agents to an activity?

No. The reason is: If you can distinguish the activities or impact of the agent, then you have multiple agents with multiple activities. E.g. if you let three students help you proofreading a file and you get back 1 revised version, then the three students are 1 agent as you cannot distinguish between their results. If you get back 3 versions you have 3 agents and 3 activities.

Overview

Authors:P. Mühleder muehleder@ub.uni-leipzig.de, F. Rämisch raemisch@ub.uni-leipzig.de
License:MIT
Copyright:2018, Peter Mühleder and Universitätsbibliothek Leipzig

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
provit-0.2.2-py3-none-any.whl (15.0 kB) Copy SHA256 hash SHA256 Wheel py3 Apr 25, 2018
provit-0.2.2.tar.gz (12.2 kB) Copy SHA256 hash SHA256 Source None Apr 25, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page