Collection of functions for inserting provenance information to the output files.
Project description
file-provenance-utils
Collection of functions for deriving provenance information for insertion into the output files.
This will include:
-
The absolute path for the primary executable.
-
The timestamp for when the file was created.
-
The user account that generated the output file.
-
The host/server that the software was executed on.
Installation
Please see the INSTALL guide for instructions.
Usage
Example code example/json_example.py for writing JSON output file:
from file_provenance_utils import get_json_provenance
outfile = "/tmp/test-file-provenance-utils/report.json"
config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
"input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
"data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
"data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}
provenance = get_json_provenance(
config_file=config_file,
logfile=logfile,
outdir=outdir,
files=files
)
# Assuming the provenance returned is a dictionary and needs to be serialized to JSON
import json
with open(outfile, "w") as f:
json.dump(provenance, f, indent=4)
print(f"Wrote provenance information to: {outfile}")
Contents of output JSON file:
{
"provenance": {
"executable": "/tmp/test-file-provenance-utils/json_example.py",
"config_file": "config.yaml",
"logfile": "/tmp/analysis.log",
"date-created": "2025-02-23 15:41:01",
"host": "r2d2",
"user": "sundaram",
"outdir": "/tmp/test-file-provenance-utils/instance-outdir",
"files": {
"input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
"data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
"data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}
}
}
Example code example/xml_example.py for writing XML output file:
from file_provenance_utils import get_xml_provenance
outfile = "/tmp/test-file-provenance-utils/report.xml"
config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
"input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
"data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
"data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}
provenance = get_xml_provenance(
config_file=config_file,
logfile=logfile,
outdir=outdir,
files=files
)
# Write the XML string directly to the output file
with open(outfile, "w") as f:
f.write(provenance)
print(f"Wrote provenance information to: {outfile}")
Contents of output XML file:
<?xml version="1.0" ?>
<provenance>
<executable>/tmp/test-file-provenance-utils/xml_example.py</executable>
<config_file>config.yaml</config_file>
<logfile>/tmp/analysis.log</logfile>
<date-created>2025-02-23 15:45:55</date-created>
<host>r2d2</host>
<user>sundaram</user>
<outdir>/tmp/test-file-provenance-utils/instance-outdir</outdir>
<files>
<file name="input_file">/tmp/test-file-provenance-utils/instance-input-file.txt</file>
<file name="data1_file">/tmp/test-file-provenance-utils/data1-file.txt</file>
<file name="data2_file">/tmp/test-file-provenance-utils/data2-file.txt</file>
</files>
</provenance>
Example code example/plain_text_example.py for writing tab-delimited or comma-separated or plain text output file:
from file_provenance_utils import get_plain_text_provenance
outfile = "/tmp/test-file-provenance-utils/report.txt"
config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
"input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
"data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
"data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}
provenance = get_plain_text_provenance(
config_file=config_file,
logfile=logfile,
outdir=outdir,
files=files
)
with open(outfile, "w") as f:
f.write(provenance)
print(f"Wrote provenance information to: {outfile}")
Contents of output plain-text file:
## executable: /tmp/test-file-provenance-utils/plain_text_example.py
## config_file: config.yaml
## logfile: /tmp/analysis.log
## date-created: 2025-02-23 15:40:52
## host: r2d2
## user: sundaram
## outdir: /tmp/test-file-provenance-utils/instance-outdir
## files:
## input_file: /tmp/test-file-provenance-utils/instance-input-file.txt
## data1_file: /tmp/test-file-provenance-utils/data1-file.txt
## data2_file: /tmp/test-file-provenance-utils/data2-file.txt
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
To-Do/Coming Next
Please view the listing of planned improvements here.
CHANGELOG
Please view the CHANGELOG here.
License
======= History
0.1.0 (2025-02-23)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_provenance_utils-0.1.0.tar.gz.
File metadata
- Download URL: file_provenance_utils-0.1.0.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c2434555704079fe82d647710a74feef784f939c6931d154c3a49a2e7cf5269
|
|
| MD5 |
8c8c449b169e4c10548f2fbcb99888db
|
|
| BLAKE2b-256 |
e2e293a2e1b041ab44e9e2e331dba60cfbeab11d1ef55313e0728705aa68cf7f
|
File details
Details for the file file_provenance_utils-0.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: file_provenance_utils-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
355cfff29a188243426f4552476c366a2309cb7eb8c373a74debd6f1a26009cc
|
|
| MD5 |
c71c2b47f5d0164b29b29f6c65bab291
|
|
| BLAKE2b-256 |
7ba13002d315aeb793643ea6c38aaf78ca418177db768eeabac81baeb07ad673
|