A command-line interface to run ML workflows in the cloud
Project description
👋 Intro
To run ML workflows, your local machine is often not enough. That’s why you often have to automate running ML workflows withing the cloud infrastructure.
Instead of managing infrastructure yourself, writing own scripts, or using cumbersome MLOps platforms, with dstack, you can focus on code while dstack does management of dependencies, infrastructure, and data for you.
dstack is an alternative to KubeFlow, SageMaker, Docker, SSH, custom scripts, and other tools used often to run ML workflows.
Primary features of dstack:
- Git-focused: Define workflows and their hardware requirements as code. When you run a workflow, dstack detects the current branch, commit hash, and local changes.
- Data management: Workflow artifacts are the 1st-class citizens. Assign tags to finished workflows to reuse their artifacts from other workflows. Version data using tags.
- Environment setup: No need to build custom Docker images or setup CUDA yourself. Just specify Conda requirements and they will be pre-configured.
- Interruption-friendly: Because artifacts can be stored in real-time, you can leverage interruptible (spot/preemptive) instances. Workflows can be resumed from where they were interrupted.
- Technology-agnostic: No need to use specific APIs in your code. Anything that works locally, can run via dstack.
- Dev environments: Workflows may be not only tasks and applications but also dev environments, incl. IDEs and notebooks.
- Very easy setup: Install the dstack CLI and run workflows in the cloud using your local credentials. The state is stored in an S3 bucket. No need to set up anything else.
📦 Installation
To use dstack, you'll only need the dstack CLI. No other software needs to be installed or deployed.
The CLI will use your local cloud credentials (e.g. the default AWS environment variables
or the credentials from ~/.aws/credentials
.)
In order to install the CLI, you need to use pip:
pip install dstack
Before you can use dstack, you have to configure the dstack backend:
- In which S3 bucket to store the state and the artifacts
- In what region, create cloud instances.
To configure this, run the following command:
dstack config
Configure AWS backend:
AWS profile name (default):
S3 bucket name:
Region name:
The configuration will be stored in ~/.dstack/config.yaml
.
That's it. Now you can use dstack on your machine.
✨ Usage
Define workflows
Workflows can be defined in the .dstack/workflows.yaml
file within your
project.
For every workflow, you can specify the provider, dependencies, commands, what output folders to store as artifacts, and what resources the instance would need (e.g. whether it should be a spot/preemptive instance, how much memory, GPU, etc.)
workflows:
- name: "train"
provider: bash
deps:
- tag: mnist_data
python: 3.10
env:
- PYTHONPATH=src
commands:
- pip install requirements.txt
- python src/train.py
artifacts:
- path: checkpoint
resources:
interruptible: true
gpu: 1
Run workflows
Once you run the workflow, dstack will create the required cloud instance(s) within a minute, and will run your workflow. You'll see the output in real-time as your workflow is running.
$ dstack run train
Provisioning... It may take up to a minute. ✓
To interrupt, press Ctrl+C.
...
Environment setup: dstack automatically sets up environment for the workflow. It pre-installs the right CUDA driver, the right version of Python, and Conda.
Git: When you run a workflow withing a Git repository, dstack detects the current branch, commit hash, and local changes, and uses it on the cloud instance(s) to run the workflow.
Artifacts and tags
Every workflow may have its output artifacts. They can be accessed via the dstack artifacts
CLI command.
You can assign tags to finished workflows to reuse their output artifacts from other workflows.
You can also use tags to upload local data and reuse it from other workflows.
If you've added a tag, you can refer to it as to a dependency via the deps
property of your workflow
in .dstack/workflows.yaml
:
deps:
- tag: mnist_data
You can refer not only to tags within your current Git repository but to the tags from your other repositories.
Here's an example how the workflow refers to a tag from the dstackai/dstack-examples
repository:
deps:
- tag: dstackai/dstack-examples/mnist_data
Tags can be managed via the dstack tags
CLI command.
Providers
dstack offers multiple providers that allow running tasks, applications, and dev environments.
📘 Docs
More tutorials, examples, and the full CLI reference can be found at docs.dstack.ai.
🛟 Help
If you encounter bugs, please report them directly to the issue tracker.
For questions and support, join the Slack chat.
Licence
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dstack-0.0.6.10.tar.gz
.
File metadata
- Download URL: dstack-0.0.6.10.tar.gz
- Upload date:
- Size: 49.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f0e48a2b20be8dc3fc61d981082e79dc6d70b5c90003736666fdfb768628486 |
|
MD5 | cf5d5640a4755be5186db4e56a163543 |
|
BLAKE2b-256 | 9b0c0a2bb60082f1b8357df1f2846a21ceb137c2012ccceae1763e6769c84ff2 |
File details
Details for the file dstack-0.0.6.10-py3-none-any.whl
.
File metadata
- Download URL: dstack-0.0.6.10-py3-none-any.whl
- Upload date:
- Size: 70.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd479f536a63771a5d60cf122742dfa8361d6852babeb1c1ebd769cfc1d50b10 |
|
MD5 | 4158a0a0c26f82141cfc6f66adf52b89 |
|
BLAKE2b-256 | 39ba2ce4df4baa9a9d563f11b17c7286a64f043a7e109eecb3acbe8afb07584e |