Skip to main content

yProv4DV (Data Visualization) is a python utility which allows for packaging of code, inputs and outputs of data visualization scripts. Once integrated, it will produce a zip file which includes all information necessary for reproducibility of the current script, including a copy of the files used.

Project description

HPCI Lab Logo

yProv4DA

A python utility for automatically packaging code, inputs and outputs of data visualization scripts.
Explore the docs »

Report Bug · Request Feature


Contributors Forks Stars Issues GPLv3 License

yProv4DV

yProv4DV (Data Visualization) is a python utility which allows for packaging of code, inputs and outputs of data visualization scripts. Once integrated, it will produce a zip file which includes all information necessary for reproducibility of the current script, including a copy of the files used. This library is part of the yProv framework, which means it can also produce W3C-prov compliant files useful for interpretability and reproducibility.

Installation

pip install yprov4dv

Current Compatibility

Currently, the yProv4DV library is able to track input files which are opened by the following libraries:

  • pandas (read_csv, read_parquet, read_excel, read_json)
  • xarray (open_dataset, open_mfdataset)
  • geopandas (read_file)
  • numpy (load)
  • torch (load)
  • rasterio (open)
  • As well as the standard python calls (such as open())

Additionally, if data is plotted using:

  • matplotlib (plot, bar, ...)
  • seaborn (scatterplot, lineplot, barplot, histplot, boxplot) Then the subset of data used only for visualization can be saved in an isolated file (by setting the save_input_files_subset option to True).

Any type of output files generated during the execution of the program will also be logged, indipendently of file type.

Example

Inside the examples folder is contained an example of a simple data visualization script in python. It is already integrated with the yProv4DV library, and can be run with the prompt:

python ./examples/simple.py

This execution will create:

  • The prov directory (which is customizable) and will hold all the information for the current execution, so inputs, outputs and source code (src), all in their respective folders. Additionally, in the same directory, the library creates a set of provenance files, containing a description of the current execution (in .json, dot and svg formats).
  • prov.zip: containining all the aforementioned information in a zipped RO-Crate.

Parameters

To keep the number of yprov4dv calls to a minimum, the library exposes just three directives:

  • def start_run(*args)
  • def log_input(path_to_untracked_file)
  • def log_output(path_to_untracked_file)

The behaviour of yProv4DV can be changed passing parameters to the start_run function. All possible fields are listed below:

  • provenance_directory: (str) changes where the inputs, outputs and code directory are stored;
  • prefix: (str) changes the prefix given to fields in the provenance document;
  • run_name: (str) changes the run name inside the provenance file;
  • create_json_file: (True or False) whether the json file is created or not;
  • create_dot_file: (True or False) whether the dot file is created or not, cannot be True if YPROV4DV_CREATE_JSON_FILE is False;
  • create_svg_file: (True or False) whether the svg file is created or not, cannot be True if YPROV4DV_CREATE_JSON_FILE or YPROV4DV_CREATE_DOT_FILE are False;
  • create_rocrate: (True or False) whether the ro-crate zip is created or not;
  • default_namespace: (str) changes the default namespace inside the provenance file
  • save_input_files_full: (str) decides whether input files are saved in full
  • save_input_files_subset: (str) decides whether inputs are saved as a subset (only the plotted data)
  • skip_files_larger_than: (int) In Mb, files larger than the threshold will not be copied;
  • verbose: (True or False),

For an example, run:

python ./examples/customized.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yprov4dv-1.1.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yprov4dv-1.1.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file yprov4dv-1.1.1.tar.gz.

File metadata

  • Download URL: yprov4dv-1.1.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for yprov4dv-1.1.1.tar.gz
Algorithm Hash digest
SHA256 71fb0197b859effb0902a9de31bdbda11e0c2316fc1f707601df202633e0b0f8
MD5 71fa67fc5ee7f30a11c55e2d1cce9f7e
BLAKE2b-256 c935c3fad5aab0fa7e741c91c65956e8b68828352447a15e289c04f2f2f0a78a

See more details on using hashes here.

File details

Details for the file yprov4dv-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: yprov4dv-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for yprov4dv-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afcca30439651e638aa21d9664fa8d6fdbaa76c698d9c7ceb6ba73336c5b56d9
MD5 800c40c1c8f8baff26d43489b03bf7f6
BLAKE2b-256 4b6293c30e89e36c23214fdd91a81d240c076941f2fec9da7929080219ac0783

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page