Skip to main content

Easy-to-run ML workflows on any cloud

Reason this release was yanked:

A critical bug: AttributeError: module 'dstack.version' has no attribute 'miniforge_image'

Project description

dstack

A better way to run ML workflows

Define ML workflows as code and run via CLI. Use any cloud. Collaborate within teams.

Slack

DocsInstallationQuick startUsage

Last commit PyPI - License

What is dstack?

dstack allows you to define machine learning workflows as code and run them on any cloud.

It helps you set up a reproducible environment, reuse artifacts, and launch interactive development environments and apps.

Installation

Use pip to install dstack:

pip install dstack --upgrade

Configure a remote

To run workflows remotely (e.g. in a configured cloud account), configure a remote using the dstack config command.

dstack config

? Choose backend. Use arrows to move, type to filter
> [aws]
  [gcp]
  [hub]

If you intend to run remote workflows directly in the cloud using local cloud credentials, feel free to choose aws or gcp. Refer to AWS and GCP correspondingly for the details.

If you would like to manage cloud credentials, users and other settings centrally via a user interface, it is recommended to choose hub.

The hub remote is currently in an experimental phase. If you are interested in trying it out, please contact us via Slack.

Define workflows

Define ML workflows, their output artifacts, hardware requirements, and dependencies via YAML.

workflows:
  - name: mnist-data
    provider: bash
    commands:
      - pip install torchvision
      - python mnist/mnist_data.py
    artifacts:
      - path: ./data

  - name: train-mnist
    provider: bash
    deps:
      - workflow: mnist-data
    commands:
      - pip install torchvision pytorch-lightning tensorboard
      - python mnist/train_mnist.py
    artifacts:
      - path: ./lightning_logs

YAML eliminates the need to modify code in your scripts, giving you the freedom to choose frameworks, experiment trackers, and cloud providers.

Providers

dstack supports multiple providers that enable you to set up environment, run scripts, launch interactive dev environments and apps, and perform many other tasks.

Run workflows

Once a workflow is defined, you can use the dstack run command to run it either locally or remotely.

Run locally

By default, workflows run locally on your machine.

dstack run mnist-data

RUN        WORKFLOW    SUBMITTED  STATUS     TAG  BACKENDS
penguin-1  mnist-data  now        Submitted       local

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

The artifacts from local workflows are also stored and can be reused in other local workflows.

Run remotely

To run a workflow remotely (e.g. in a configured cloud account), add the --remote flag to the dstack run command:

dstack run mnist-data --remote

RUN        WORKFLOW    SUBMITTED  STATUS     TAG  BACKENDS
mangust-1  mnist-data  now        Submitted       aws

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

The output artifacts from remote workflows are also stored remotely and can be reused by other remote workflows.

The necessary hardware resources can be configured either via YAML or through arguments in the dstack run command, such as --gpu and --gpu-name.

dstack run train-mnist --remote --gpu 1

RUN       WORKFLOW     SUBMITTED  STATUS     TAG  BACKENDS
turtle-1  train-mnist  now        Submitted       aws

Provisioning... It may take up to a minute. ✓

To interrupt, press Ctrl+C.

GPU available: True, used: True

Epoch 1: [00:03<00:00, 280.17it/s, loss=1.35, v_num=0]

Upon running a workflow remotely, dstack automatically creates resources in the configured cloud account and destroys them once the workflow is complete.

Ports

When a workflow uses ports to host interactive dev environments or applications, the dstack run command automatically forwards these ports to your local machine, allowing you to access them. Refer to Providers and Apps for the details.

More information

For additional information and examples, see the following links:

Licence

Mozilla Public License 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dstack-0.2.4.tar.gz (109.9 kB view details)

Uploaded Source

Built Distribution

dstack-0.2.4-py3-none-any.whl (13.6 MB view details)

Uploaded Python 3

File details

Details for the file dstack-0.2.4.tar.gz.

File metadata

  • Download URL: dstack-0.2.4.tar.gz
  • Upload date:
  • Size: 109.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dstack-0.2.4.tar.gz
Algorithm Hash digest
SHA256 7d8ab5328b96105049fba37fba4b9eb0c87f6cf05ba8cfa624d37c3f38a2c4df
MD5 7f8f33565c708bc9268481b8e0dc0530
BLAKE2b-256 3928e1871e35a3347f028126550bd1af1fd229b81772dde03921d8aa3eb0ef00

See more details on using hashes here.

File details

Details for the file dstack-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: dstack-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 13.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dstack-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 945d8e15d7c513660cc671afc94f045d786bce09c4cd4f4122a8dba485ca05d9
MD5 a788808262d90f703d0b60602e30057e
BLAKE2b-256 3c0b62878908af12a485d1cf0b914b3cf9d236ae0d1e9d64daf383ec63844115

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page