A Command Line Interface for https://dstack.ai
Project description
dstack: Automate Data and Training Workflows Easily
An open-core platform to automate data and training workflows, provision infrastructure, and version data and models.
High Level Features
- Define workflows and infrastructure requirements as code using declarative configuration files. When you run a workflow, dstack provisions the required infrastructure on-demand.
- Either use your existing hardware or provision infrastructure on-demand in your existing cloud account (e.g. AWS, GCP, Azure, etc.)
- Version data and models produced by workflows automatically. Assign tags to successful runs to refer to their artifacts from other workflows.
- Use the built-in workflow providers (that support specific use-cases), or create custom providers for specific use-cases using dstack's Python API.
Getting Started
Step 1: Set up runners
Runners
are machines that run submitted Workflows
. dstack supports two types of Runners
: the on-demand
Runners
and self-hosted
Runners
.
The on-demand
Runners
are created automatically by dstack (in the computing vendor, configured by the user, e.g. AWS
)
for the time of running Workflows
. The self-hosted
Runners
can be set up manually to run Workflows
using the user's own hardware.
Option 1: Set up on-demand runners
To use the on-demand
Runners
, go to the Settings
, then AWS
.
Here, you have to provide AWS Access Key ID
and AWS Secret Access Key
that have the
corresponding permissions to create EC2 instances in your AWS
account.
Once you've provided Credentials
, use the Add limit
button to configure limits:
The configured Limits
represent the maximum number of EC2 instances of the specific Instance Type
and in the specific Region
, that
dstack can create at one time to run Workflows
.
Option 2: Set up self-hosted runners
As an alternative to on-demand
Runners
, you can run Workflows
on your own hardware.
To do that, you have to run the following command on your server:
curl -fsSL https://get.dstack.ai/runner -o get-dstack-runner.sh
sudo sh get-dstack-runner.sh
dstack-runner config --token <token>
dstack-runner start
Your token
value can be found in Settings
:
If you've done this step properly, you'll see your server on the Runners
page:
Step 2: Install the CLI
Now, to be able to run Workflows
, install and configure the dstack CLI
:
pip install dstack -U
dstack config --token <token>
Step 3: Clone the repo
Just to get started, we'll run Workflows
defined in
github.com/dstackai/dstack-examples
.
git clone https://github.com/dstackai/dstack-examples.git
cd dstack-examples
The project includes two Workflows
: download-mnist
and train-mnist
. The frst Workflow
downloads the MNIST dataset,
whilst the second Workflow
trains a model using the output of the first Workflow
as an input:
.dstack/workflows.yaml
:
workflows:
- name: download-mnist
provider: python
requirements: requirements.txt
python_script: download.py
artifacts:
- data
- name: train-mnist
provider: python
requirements: requirements.txt
python_script: train.py
artifacts:
- model
depends-on:
- download-mnist
resources:
gpu: ${{ gpu }}
.dstack/variables.yaml
:
variables:
train-mnist:
gpu: 1
batch-size: 64
test-batch-size: 1000
epochs: 1
lr: 1.0
gamma: 0.7
seed: 1
log-interval: 10
Step 4: Run workflows
Go ahead, and run the train-mnist
Workflow
using the following command:
dstack run train-mnist
If you want to change any of the Variables
, you can do that in .dstack/variables.yaml
, or from the CLI
:
dstack run train-mnist --gpu 2 --epoch 100 --seed 2
When you run train-mnist
, because train-mnist
depends on download-mnist
, dstack will create a run with two Jobs
:
one for train-mnist
and one for download-mnist
:
Step 5: Tag runs
When the Run
is finished, you can assign a Tag
to it, e.g. latest
:
dstack tag cowardly-goose-1 latest
Now, you can refer to this tagged Workflow
from .dstack/workflows.yaml
:
workflows:
- name: download-mnist
provider: python
requirements: requirements.txt
python_script: download.py
artifacts:
- data
- name: train-mnist
provider: python
requirements: requirements.txt
python_script: train.py
artifacts:
- model
depends-on:
- download-mnist:latest
resources:
gpu: 1
Now, if you run the train-mnist
Workflow
, dstack won't create a Job
for the download-mnist
Workflow
.
Instead, it will reuse the Artifacts
of the tagged Workflow
.
Repository
This repository contains dstack's open-source and public code, documentation, and other key resources:
providers
: The source code of the built-in dstack workflow providerscli
: The source code of the dstack CLI pip packagedocs
: A user guide to the whole dstack platform (docs.dstack.ai)
Here's the list of other packages that are expected to be included into this repository with their source code soon:
runner
: The source code of the program that runs dstack workflowsserver
: The source code of the program that orchestrates dstack runs and jobs and provides a user interfaceexamples
: The source code of the examples of using dstack
Contributing
Please check CONTRIBUTING.md if you'd like to get involved in the development of dstack.
License
Please see LICENSE.md for more information about the terms under which the various parts of this repository are made available.
Contact
Find us on Twitter at @dstackai, join our Slack workspace for quick help and support.
Project permalink: https://github.com/dstackai/dstack
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for dstack-0.0.4rc7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3702580ecbbded68bcaa7a70a4f5cb783749bc94f067a882de67392a868eca4 |
|
MD5 | b421f4bde7cc5be8203751c95b7cce6f |
|
BLAKE2b-256 | d3de518f6a782aef544cfbaa4d2b6103119a3aaca3bdc733b0fc29b9f843264a |