No project description provided
Project description
DVC-Stage
About The Project
This python script provides a easy and parameterizeable way of defining typical dvc (sub-)stages for:
- data prepossessing
- data transformation
- data splitting
- data validation
Getting Started
This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.
Prerequisites
pandas>=0.20.*dvc>=2.12.*pyyaml>=5
Installation
This package is available on PyPI. You install it and all of its dependencies using pip:
pip install dvc-stage
Usage
DVC-Stage works ontop of two files: dvc.yaml and params.yaml. They
are expected to be at the root of an initialized dvc
project. From there you can execute dvc-stage -h to see available
commands or dvc-stage get-config STAGE to generate the dvc stages from
the params.yaml file. The tool then generates the respective yaml
which you can then manually paste into the dvc.yaml file. Existing
stages can then be updated inplace using dvc-stage update-stage STAGE.
Stages are defined inside params.yaml in the following schema:
STAGE_NAME:
load: {}
transformations: []
validations: []
write: {}
The load and write sections both require the yaml-keys path and
format to read and save data respectively.
The transformations and validations sections require a sequence of
functions to apply, where transformations return data and
validations return a truth value (derived from data). Functions are
defined by the key id an can be either:
-
Methods defined on Pandas Dataframes, e.g.
transformations: - id: transpose -
Imported from any python module, e.g.
transformations: - id: custom description: duplikate rows import_from: demo.duplicate -
Predefined by DVC-Stage, e.g.
validations: - id: validate_pandera_schema schema: import_from: demo.get_schema
When writing a custom function, you need to make sure the function
gracefully handles data being None, which is required for type
inference. Data is passed as first argument. Further arguments can be
provided as additional keys, as shown above for
validate_pandera_schema, where schema is passed as second argument to
the function.
A working demonstration can be found at examples/.
Contributing
Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our contributing guidelines.
License
Distributed under the GNU General Public License v3
Contact
Marcel Arpogaus - znepry.necbtnhf@tznvy.pbz (encrypted with ROT13)
Project Link: https://github.com/MArpogaus/dvc-stage
Acknowledgments
Parts of this work have been funded by the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety due to a decision of the German Federal Parliament (AI4Grids: 67KI2012A).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dvc_stage-1.0.1.tar.gz.
File metadata
- Download URL: dvc_stage-1.0.1.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cfa774399238c841d5bad7b6ee3d19a6b98009d47bac7aa59c3da2d50f78f3a
|
|
| MD5 |
06b6cf049e0d6bd48dab5b294bcf232a
|
|
| BLAKE2b-256 |
c35af2a5baefdcec8afc97ce039d966f062bc81a01a36557e32421a43a90e15e
|
Provenance
The following attestation bundles were made for dvc_stage-1.0.1.tar.gz:
Publisher:
release.yaml on MArpogaus/dvc-stage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dvc_stage-1.0.1.tar.gz -
Subject digest:
8cfa774399238c841d5bad7b6ee3d19a6b98009d47bac7aa59c3da2d50f78f3a - Sigstore transparency entry: 215034824
- Sigstore integration time:
-
Permalink:
MArpogaus/dvc-stage@0381bd6196a6b3e72b3e113d78f96a277c424710 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/MArpogaus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@0381bd6196a6b3e72b3e113d78f96a277c424710 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dvc_stage-1.0.1-py3-none-any.whl.
File metadata
- Download URL: dvc_stage-1.0.1-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba7d061ea768b4d15e8c88352903a9d0c4870cd30759462e3e97e880b7b01bee
|
|
| MD5 |
de5e767a7c9ff40a5451b6985629cba2
|
|
| BLAKE2b-256 |
f83da09b102528c152e7c05eb71ee4420b9dd1ca7a43dd562419a9af035aa1b2
|
Provenance
The following attestation bundles were made for dvc_stage-1.0.1-py3-none-any.whl:
Publisher:
release.yaml on MArpogaus/dvc-stage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dvc_stage-1.0.1-py3-none-any.whl -
Subject digest:
ba7d061ea768b4d15e8c88352903a9d0c4870cd30759462e3e97e880b7b01bee - Sigstore transparency entry: 215034826
- Sigstore integration time:
-
Permalink:
MArpogaus/dvc-stage@0381bd6196a6b3e72b3e113d78f96a277c424710 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/MArpogaus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@0381bd6196a6b3e72b3e113d78f96a277c424710 -
Trigger Event:
push
-
Statement type: