Skip to main content

Collection of functions for inserting provenance information to the output files.

Project description

file-provenance-utils

Collection of functions for deriving provenance information for insertion into the output files.

This will include:

Installation

Please see the INSTALL guide for instructions.

Usage

Example code example/json_example.py for writing JSON output file:

from file_provenance_utils import get_json_provenance

outfile = "/tmp/test-file-provenance-utils/report.json"

config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_json_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)

# Assuming the provenance returned is a dictionary and needs to be serialized to JSON
import json
with open(outfile, "w") as f:
    json.dump(provenance, f, indent=4)

print(f"Wrote provenance information to: {outfile}")

Contents of output JSON file:

{
    "provenance": {
        "executable": "/tmp/test-file-provenance-utils/json_example.py",
        "config_file": "config.yaml",
        "logfile": "/tmp/analysis.log",
        "date-created": "2025-02-23 15:41:01",
        "host": "r2d2",
        "user": "sundaram",
        "outdir": "/tmp/test-file-provenance-utils/instance-outdir",
        "files": {
            "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
            "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
            "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
        }
    }
}

Example code example/xml_example.py for writing XML output file:

from file_provenance_utils import get_xml_provenance

outfile = "/tmp/test-file-provenance-utils/report.xml"

config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_xml_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)

# Write the XML string directly to the output file
with open(outfile, "w") as f:
    f.write(provenance)

print(f"Wrote provenance information to: {outfile}")

Contents of output XML file:

<?xml version="1.0" ?>
<provenance>
    <executable>/tmp/test-file-provenance-utils/xml_example.py</executable>
    <config_file>config.yaml</config_file>
    <logfile>/tmp/analysis.log</logfile>
    <date-created>2025-02-23 15:45:55</date-created>
    <host>r2d2</host>
    <user>sundaram</user>
    <outdir>/tmp/test-file-provenance-utils/instance-outdir</outdir>
    <files>
        <file name="input_file">/tmp/test-file-provenance-utils/instance-input-file.txt</file>
        <file name="data1_file">/tmp/test-file-provenance-utils/data1-file.txt</file>
        <file name="data2_file">/tmp/test-file-provenance-utils/data2-file.txt</file>
    </files>
</provenance>

Example code example/plain_text_example.py for writing tab-delimited or comma-separated or plain text output file:

from file_provenance_utils import get_plain_text_provenance

outfile = "/tmp/test-file-provenance-utils/report.txt"
config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_plain_text_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)
with open(outfile, "w") as f:
    f.write(provenance)

print(f"Wrote provenance information to: {outfile}")

Contents of output plain-text file:

## executable: /tmp/test-file-provenance-utils/plain_text_example.py
## config_file: config.yaml
## logfile: /tmp/analysis.log
## date-created: 2025-02-23 15:40:52
## host: r2d2
## user: sundaram
## outdir: /tmp/test-file-provenance-utils/instance-outdir
## files:
## input_file: /tmp/test-file-provenance-utils/instance-input-file.txt
## data1_file: /tmp/test-file-provenance-utils/data1-file.txt
## data2_file: /tmp/test-file-provenance-utils/data2-file.txt

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

To-Do/Coming Next

Please view the listing of planned improvements here.

CHANGELOG

Please view the CHANGELOG here.

License

No License

======= History

0.1.0 (2025-02-23)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_provenance_utils-0.1.1.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_provenance_utils-0.1.1-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file file_provenance_utils-0.1.1.tar.gz.

File metadata

  • Download URL: file_provenance_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for file_provenance_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 576f12fe9f8f3e746153293f3de0559f1e43648a2ef56ab129ccbec394068d77
MD5 3503b50e74ae06e12acf7c567a09b7f3
BLAKE2b-256 b65c66df0df09db696138967140a8521a1ee2c8dbef63832ae880b90e8acaf19

See more details on using hashes here.

File details

Details for the file file_provenance_utils-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for file_provenance_utils-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7c4d46a3df3e620d935ef4cb13f4e595b6263be8ae9bab4b8633bef4054387fe
MD5 c0b7574c42714e797466e69454b0293c
BLAKE2b-256 07bb0336afcf1b00b9b819089f647b3782285237d6b017759a00ec580e0a4ca7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page