Skip to main content

No project description provided

Project description

img img img img img img img img img

img

DVC-Stage

  1. About The Project
  2. Getting Started
    1. Prerequisites
    2. Installation
  3. Usage
  4. Contributing
  5. License
  6. Contact
  7. Acknowledgments

About The Project

This python script provides a easy and parameterizeable way of defining typical dvc (sub-)stages for:

  • data prepossessing
  • data transformation
  • data splitting
  • data validation

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

  • pandas>=0.20.*
  • dvc>=2.12.*
  • pyyaml>=5

Installation

This package is available on PyPI. You install it and all of its dependencies using pip:

pip install dvc-stage

Usage

DVC-Stage works ontop of two files: dvc.yaml and params.yaml. They are expected to be at the root of an initialized dvc project. From there you can execute dvc-stage -h to see available commands or dvc-stage get-config STAGE to generate the dvc stages from the params.yaml file. The tool then generates the respective yaml which you can then manually paste into the dvc.yaml file. Existing stages can then be updated inplace using dvc-stage update-stage STAGE.

Stages are defined inside params.yaml in the following schema:

STAGE_NAME:
  load: {}
  transformations: []
  validations: []
  write: {}

The load and write sections both require the yaml-keys path and format to read and save data respectively.

The transformations and validations sections require a sequence of functions to apply, where transformations return data and validations return a truth value (derived from data). Functions are defined by the key id an can be either:

  • Methods defined on Pandas Dataframes, e.g.

    transformations:
      - id: transpose
    
  • Imported from any python module, e.g.

    transformations:
      - id: custom
        description: duplikate rows
        import_from: demo.duplicate
    
  • Predefined by DVC-Stage, e.g.

    validations:
      - id: validate_pandera_schema
        schema:
          import_from: demo.get_schema
    

When writing a custom function, you need to make sure the function gracefully handles data being None, which is required for type inference. Data is passed as first argument. Further arguments can be provided as additional keys, as shown above for validate_pandera_schema, where schema is passed as second argument to the function.

A working demonstration can be found at examples/.

Contributing

Any Contributions are greatly appreciated! If you have a question, an issue or would like to contribute, please read our contributing guidelines.

License

Distributed under the GNU General Public License v3

Contact

Marcel Arpogaus - znepry.necbtnhf@tznvy.pbz (encrypted with ROT13)

Project Link: https://github.com/MArpogaus/dvc-stage

Acknowledgments

Parts of this work have been funded by the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety due to a decision of the German Federal Parliament (AI4Grids: 67KI2012A).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvc_stage-1.0.1.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dvc_stage-1.0.1-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file dvc_stage-1.0.1.tar.gz.

File metadata

  • Download URL: dvc_stage-1.0.1.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dvc_stage-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8cfa774399238c841d5bad7b6ee3d19a6b98009d47bac7aa59c3da2d50f78f3a
MD5 06b6cf049e0d6bd48dab5b294bcf232a
BLAKE2b-256 c35af2a5baefdcec8afc97ce039d966f062bc81a01a36557e32421a43a90e15e

See more details on using hashes here.

Provenance

The following attestation bundles were made for dvc_stage-1.0.1.tar.gz:

Publisher: release.yaml on MArpogaus/dvc-stage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dvc_stage-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dvc_stage-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dvc_stage-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba7d061ea768b4d15e8c88352903a9d0c4870cd30759462e3e97e880b7b01bee
MD5 de5e767a7c9ff40a5451b6985629cba2
BLAKE2b-256 f83da09b102528c152e7c05eb71ee4420b9dd1ca7a43dd562419a9af035aa1b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for dvc_stage-1.0.1-py3-none-any.whl:

Publisher: release.yaml on MArpogaus/dvc-stage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page