Skip to main content

Collection of functions for inserting provenance information to the output files.

Project description

file-provenance-utils

Collection of functions for deriving provenance information for insertion into the output files.

This will include:

Installation

Please see the INSTALL guide for instructions.

Usage

Example code example/json_example.py for writing JSON output file:

from file_provenance_utils import get_json_provenance

outfile = "/tmp/test-file-provenance-utils/report.json"

config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_json_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)

# Assuming the provenance returned is a dictionary and needs to be serialized to JSON
import json
with open(outfile, "w") as f:
    json.dump(provenance, f, indent=4)

print(f"Wrote provenance information to: {outfile}")

Contents of output JSON file:

{
    "provenance": {
        "executable": "/tmp/test-file-provenance-utils/json_example.py",
        "config_file": "config.yaml",
        "logfile": "/tmp/analysis.log",
        "date-created": "2025-02-23 15:41:01",
        "host": "r2d2",
        "user": "sundaram",
        "outdir": "/tmp/test-file-provenance-utils/instance-outdir",
        "files": {
            "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
            "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
            "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
        }
    }
}

Example code example/xml_example.py for writing XML output file:

from file_provenance_utils import get_xml_provenance

outfile = "/tmp/test-file-provenance-utils/report.xml"

config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_xml_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)

# Write the XML string directly to the output file
with open(outfile, "w") as f:
    f.write(provenance)

print(f"Wrote provenance information to: {outfile}")

Contents of output XML file:

<?xml version="1.0" ?>
<provenance>
    <executable>/tmp/test-file-provenance-utils/xml_example.py</executable>
    <config_file>config.yaml</config_file>
    <logfile>/tmp/analysis.log</logfile>
    <date-created>2025-02-23 15:45:55</date-created>
    <host>r2d2</host>
    <user>sundaram</user>
    <outdir>/tmp/test-file-provenance-utils/instance-outdir</outdir>
    <files>
        <file name="input_file">/tmp/test-file-provenance-utils/instance-input-file.txt</file>
        <file name="data1_file">/tmp/test-file-provenance-utils/data1-file.txt</file>
        <file name="data2_file">/tmp/test-file-provenance-utils/data2-file.txt</file>
    </files>
</provenance>

Example code example/plain_text_example.py for writing tab-delimited or comma-separated or plain text output file:

from file_provenance_utils import get_plain_text_provenance

outfile = "/tmp/test-file-provenance-utils/report.txt"
config_file = "config.yaml"
logfile = "/tmp/analysis.log"
outdir = "/tmp/test-file-provenance-utils/instance-outdir"
files = {
    "input_file": "/tmp/test-file-provenance-utils/instance-input-file.txt",
    "data1_file": "/tmp/test-file-provenance-utils/data1-file.txt",
    "data2_file": "/tmp/test-file-provenance-utils/data2-file.txt"
}

provenance = get_plain_text_provenance(
    config_file=config_file,
    logfile=logfile,
    outdir=outdir,
    files=files
)
with open(outfile, "w") as f:
    f.write(provenance)

print(f"Wrote provenance information to: {outfile}")

Contents of output plain-text file:

## executable: /tmp/test-file-provenance-utils/plain_text_example.py
## config_file: config.yaml
## logfile: /tmp/analysis.log
## date-created: 2025-02-23 15:40:52
## host: r2d2
## user: sundaram
## outdir: /tmp/test-file-provenance-utils/instance-outdir
## files:
## input_file: /tmp/test-file-provenance-utils/instance-input-file.txt
## data1_file: /tmp/test-file-provenance-utils/data1-file.txt
## data2_file: /tmp/test-file-provenance-utils/data2-file.txt

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

To-Do/Coming Next

Please view the listing of planned improvements here.

CHANGELOG

Please view the CHANGELOG here.

License

No License

======= History

0.1.0 (2025-02-23)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_provenance_utils-0.1.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_provenance_utils-0.1.0-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file file_provenance_utils-0.1.0.tar.gz.

File metadata

  • Download URL: file_provenance_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for file_provenance_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0c2434555704079fe82d647710a74feef784f939c6931d154c3a49a2e7cf5269
MD5 8c8c449b169e4c10548f2fbcb99888db
BLAKE2b-256 e2e293a2e1b041ab44e9e2e331dba60cfbeab11d1ef55313e0728705aa68cf7f

See more details on using hashes here.

File details

Details for the file file_provenance_utils-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for file_provenance_utils-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 355cfff29a188243426f4552476c366a2309cb7eb8c373a74debd6f1a26009cc
MD5 c71c2b47f5d0164b29b29f6c65bab291
BLAKE2b-256 7ba13002d315aeb793643ea6c38aaf78ca418177db768eeabac81baeb07ad673

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page