Skip to main content

Generate DVC pipeline files from Python declarations.

Project description

dvcgen

Write your DVC pipeline once, in Python.

dvcgen is an early-stage command-line tool for generating DVC pipeline files from lightweight declarations embedded in Python pipeline scripts.

Current Status

Implemented:

  • A Python package named dvcgen
  • A dvcgen console command
  • CLI argument parsing for pipeline script paths
  • CLI input validation and overwrite protection
  • Public declaration helpers: dep(), out(), and param()
  • Python script inspection for top-level literal declarations
  • dvc.yaml generation
  • params.yaml generation

Installation

uv tool install dvcgen

Or run without installing:

uvx dvcgen --help

Usage

Show CLI help:

dvcgen --help

Generate DVC files from one or more Python pipeline scripts:

dvcgen pipeline/*.py

The command writes dvc.yaml and params.yaml in the current directory. Stage names are derived from input Python filenames. For example, pipeline/train.py becomes the train stage.

By default, dvcgen refuses to overwrite existing dvc.yaml or params.yaml files. Use --force when you intentionally want to replace them:

dvcgen --force pipeline/*.py

Write files to another directory with --output-dir:

dvcgen --output-dir generated pipeline/*.py

Bad inputs fail with an error message and a non-zero exit code. Successful runs print the files that were written.

Inspect declarations from Python without executing the pipeline script:

from dvcgen.inspect import inspect_file

declarations = inspect_file("pipeline/train.py")
print(declarations.deps)
print(declarations.outs)
print(declarations.params)

Release

Publishing is intentionally manual while the project is early stage. Build and validate artifacts before uploading anything:

uv run python -m build
uv run twine check dist/*

Use TestPyPI first when rehearsing a release. Create a TestPyPI API token, then upload with the token as the password:

uv run twine upload --repository testpypi dist/*

Use the production PyPI repository only when the version, changelog, and package name decision are ready:

uv run twine upload dist/*

For both repositories, use __token__ as the username and the repository API token as the password. Avoid committing tokens or storing them in project files.

Before the first production upload, decide whether to publish the current minimal release to reserve the dvcgen package name on PyPI. Once a version is uploaded to PyPI or TestPyPI, that exact version cannot be uploaded again; bump the version before retrying with changed artifacts.

Planned MVP

The intended MVP is:

  1. Pipeline scripts declare dependencies, outputs, and parameters in Python.
  2. dvcgen inspects those declarations without executing the scripts.
  3. dvcgen writes dvc.yaml and params.yaml.

Example API:

from dvcgen import dep, out, param

TRAIN_DATA = dep("data/processed.csv")
MODEL = out("models/model.pkl")

LR = param("train.lr", 0.001)

Running:

dvcgen pipeline/train.py

Generates dvc.yaml:

"stages":
  "train":
    "cmd": "python pipeline/train.py"
    "deps":
      - "pipeline/train.py"
      - "data/processed.csv"
    "outs":
      - "models/model.pkl"
    "params":
      - "train.lr"

And params.yaml:

"train":
  "lr": 0.001

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvcgen-0.2.0.tar.gz (59.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dvcgen-0.2.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file dvcgen-0.2.0.tar.gz.

File metadata

  • Download URL: dvcgen-0.2.0.tar.gz
  • Upload date:
  • Size: 59.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for dvcgen-0.2.0.tar.gz
Algorithm Hash digest
SHA256 314df39d17e214da1f8a48229f4fe9dfb98fa0c0d43b8cdbc69ffb2af829a56d
MD5 5886c190af35c41edd9291681cfefc85
BLAKE2b-256 fd53041f53cb376a81277f217eb69035803c35cae001e7946000b55dd2a64fe7

See more details on using hashes here.

File details

Details for the file dvcgen-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dvcgen-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for dvcgen-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 159e93c715490f88f77a77c511a4b387169894da26aeaa21efc56588c29ced3e
MD5 711f0ec0fd4ab31263334f180d04e557
BLAKE2b-256 fc74b5562811aeb24789275c4c5c094a0468422415259f7ff96c25dc38ae7801

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page