An open-source tool for teams to build reproducible ML workflows
Project description
Reproducible ML workflows
dstack
is an open-source tool that allows running reproducible ML workflows independently of
infrastructure, and collaborate around data and models.
Docs • Quick start • Basics • Slack
dstack
is an open-source tool that allows running reproducible ML workflows independently of
infrastructure. It allows running ML workflows locally or remotely, using any configured cloud vendor.
Additionally, dstack
facilitates versioning and reuse of artifacts (such as data and models), across teams.
In brief, dstack
simplifies the process of establishing ML training pipelines that are independent of a
particular vendor, and facilitates collaboration within teams on data and models.
How does it work?
- Define workflows via YAML
- Run workflows locally via CLI
- Reuse artifacts (data and models) across workflows
- Run workflows remotely (in any configured cloud) via CLI
- Share artifacts (data and models) across teams
Installation
Use pip to install dstack
locally:
pip install dstack --upgrade
To run workflows remotely (e.g. in the cloud) or share artifacts outside your machine, you must configure your remote
settings using the dstack config
command:
dstack config
This command will ask you to choose an AWS profile (which will be used for AWS credentials), an AWS region (where workflows will be run), and an S3 bucket (to store remote artifacts and metadata).
AWS profile: default
AWS region: eu-west-1
S3 bucket: dstack-142421590066-eu-west-1
EC2 subnet: none
Example
Here's an example from dstack-examples.
workflows:
# Saves the MNIST dataset as reusable artifact for other workflows
- name: mnist-data
provider: bash
commands:
- pip install -r mnist/requirements.txt
- python mnist/download.py
artifacts:
# Saves the folder with the dataset as an artifact
- path: ./data
# Trains a model using the dataset from the `mnist-data` workflow
- name: mnist-train
provider: bash
deps:
# Depends on the artifacts from the `mnist-data` workflow
- workflow: mnist-data
commands:
- pip install -r mnist/requirements.txt
- python mnist/train.py
artifacts:
# Saves the `folder with logs and checkpoints as an artifact
- path: ./lightning_logs
With workflows defined in this manner, dstack
allows for effortless execution either locally
or in a configured cloud account, while also enabling versioning and reuse of artifacts.
More information
For additional information and examples, see the following links:
Licence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dstack-0.1rc1.tar.gz
.
File metadata
- Download URL: dstack-0.1rc1.tar.gz
- Upload date:
- Size: 72.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d071d1c0a5b79ba57eec325e749c792f431f7d15eae599576e4d74573a0cc02a |
|
MD5 | 1fc06a5acb3df3f8c908dfd2054a546a |
|
BLAKE2b-256 | 4a0a1aba984a6666ea83553da77e105fa5eed6db4f2b2fe12f9c2d86c48cad5a |
File details
Details for the file dstack-0.1rc1-py3-none-any.whl
.
File metadata
- Download URL: dstack-0.1rc1-py3-none-any.whl
- Upload date:
- Size: 13.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 411468eb84df1abe4a47e622f57049a063be3e796a375ea8b8992a80858748d4 |
|
MD5 | de4ca9c69d418b092854c03aa5fc83f9 |
|
BLAKE2b-256 | a592d8ff595477db35d7ea4bfcdafa1d31242ccb0e3433e07412d9f2b47b1e63 |