No project description provided
Project description
pynb-dag-runner
pynb-dag-runner
is a Python library that can run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless).
This means that pynb-dag-runner
does not need a tracking server (or database) to record task outcomes (like logged ML metrics, models, artifacts).
Instead, pipeline outputs are emitted using the OpenTelemetry standard.
Since structured logs can be directed to a file (as one option), this can be used to run pipelines on limited or no cloud infrastructure;
after pipeline execution one only needs to preserve the structured logs.
Documentation and architecture
Demo
-
The below shows a demo ML training pipeline that uses only Github infrastructure (that is: Github actions for compute; Build artifacts for storage; and Github Pages for reporting). This uses
pynb-dag-runner
and a fork of MLFlow that can be deployed as a static website (see, https://github.com/pynb-dag-runner/mlflow). -
Codes for pipeline (MIT): https://github.com/pynb-dag-runner/mnist-digits-demo-pipeline
Roadmap and project planning
Install via PyPI
Latest release
pip install pynb-dag-runner
- https://pypi.org/project/pynb-dag-runner
Snapshot of latest commit to main branch
pip install pynb-dag-runner-snapshot
- https://pypi.org/project/pynb-dag-runner-snapshot
Any feedback/ideas welcome!
License
(c) Matias Dahl 2021-2022, MIT, see LICENSE.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for pynb_dag_runner_snapshot-0.0.9.dev1670323197-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39b618bdc236766381da051fe4602660a9ba9036db316f2b5797d7f50de7e573 |
|
MD5 | bcc22a29063c3612078bb85e2ef4bc60 |
|
BLAKE2b-256 | 25e61a7ecf4e284f2cb600eda53ad1f3d205d1e9ae9688c9922c2d3e1ea520c0 |