An provenance tracking library for simple Python workflows
Project description
makeprov: Pythonic Provenance Tracking
This library provides a way to track file provenance in Python workflows using PROV (W3C Provenance) semantics. It supports defining input/output files via decorators and automatically generates provenance datasets.
Features
- Use decorators to define rules for workflows.
- Automatically generate RDF-based provenance metadata.
- Handles input and output streams.
- Integrates with Python's type hints for easy configuration.
- Outputs provenance data in TRIG format if
rdflibis installed; otherwise outputs json-ld.
Installation
You can install the module directly from PyPI:
pip install makeprov
Usage
Here’s an example of how to use this package in your Python scripts:
from makeprov import rule, InPath, OutPath, build
@rule()
def process_data(
input_file: InPath = InPath('input.txt'),
output_file: OutPath = OutPath('output.txt')
):
with input_file.open('r') as infile, output_file.open('w') as outfile:
data = infile.read()
outfile.write(data.upper())
if __name__ == '__main__':
process_data()
# or as a command line interface
import defopt
defopt.run(process_data)
# or as a workflow graph that automatically (re)generates all dependencies
from makeprov import build
build('output.txt')
You can execute example.py via the CLI like so:
python example.py build-all
# Or set configuration through the CLI
python example.py build-all --conf='{"base_iri": "http://mybaseiri.org/", "prov_dir": "my_prov_directory"}' --force --input_file input.txt --output_file final_output.txt
# Or set configuration through a TOML file
python example.py build-all --conf=@my_config.toml
Complex CSV-to-RDF Workflow
For a more involved scenario, see complex_example.py. It creates multiple CSV files, aggregates their contents, and emits an RDF graph that is both serialized to disk and embedded into the provenance dataset because the function returns an rdflib.Graph.
@rule()
def export_totals_graph(
totals_csv: InPath = InPath("data/region_totals.csv"),
graph_ttl: OutPath = OutPath("data/region_totals.ttl"),
) -> Graph:
graph = Graph()
graph.bind("sales", SALES)
with totals_csv.open("r", newline="") as handle:
for row in csv.DictReader(handle):
region_key = row["region"].lower().replace(" ", "-")
subject = SALES[f"region/{region_key}"]
graph.add((subject, RDF.type, SALES.RegionTotal))
graph.add((subject, SALES.regionName, Literal(row["region"])))
graph.add((subject, SALES.totalUnits, Literal(row["total_units"], datatype=XSD.integer)))
graph.add((subject, SALES.totalRevenue, Literal(row["total_revenue"], datatype=XSD.decimal)))
with graph_ttl.open("w") as handle:
handle.write(graph.serialize(format="turtle"))
return graph
Run the entire workflow, including CSV generation and RDF export, with:
python complex_example.py build-sales-report
Configuration
You can customize the provenance tracking with the following options:
base_iri(str): Base IRI for new resourcesprov_dir(str): Directory for writing PROV.json-ldor.trigfilesforce(bool): Force running of dependenciesdry_run(bool): Only check workflow, don't run anything
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file makeprov-0.2.2.tar.gz.
File metadata
- Download URL: makeprov-0.2.2.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fd1d44c656b25bd6097b36efc02df00e502ae431d3c39ed31672d10bef33d61
|
|
| MD5 |
5aa6d23f60850f52026c0ec973d41be6
|
|
| BLAKE2b-256 |
9697edc26fc73e8f8872b88feafdd67f5be4c8c0e85e38c7435a2205f37e3a1c
|
File details
Details for the file makeprov-0.2.2-py3-none-any.whl.
File metadata
- Download URL: makeprov-0.2.2-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22379d00334c3699d261ca24fcd2cb35b64eef89e938d26cc8518a064b9ae21d
|
|
| MD5 |
1e2ec438c31e12c3b2d8086240e3e327
|
|
| BLAKE2b-256 |
f235accfaec71ed24a697f70a85ec2b3c853335bf3ca73d273dcd90d67ff9826
|