Skip to main content

An open-source tool for teams to build reproducible ML workflows

Project description

dstack

Reproducible ML workflows

dstack is an open-source tool that allows running reproducible ML workflows independently of infrastructure, and collaborate around data and models.

Slack

DocsQuick startBasicsSlack

Last commit PyPI - License

dstack is an open-source tool that allows running reproducible ML workflows independently of infrastructure. It allows running ML workflows locally or remotely, using any configured cloud vendor. Additionally, dstack facilitates versioning and reuse of artifacts (such as data and models), across teams.

In brief, dstack simplifies the process of establishing ML training pipelines that are independent of a particular vendor, and facilitates collaboration within teams on data and models.

How does it work?

  • Define workflows via YAML
  • Run workflows locally via CLI
  • Reuse artifacts (data and models) across workflows
  • Run workflows remotely (in any configured cloud) via CLI
  • Share artifacts (data and models) across teams

Installation

Use pip to install dstack locally:

pip install dstack --upgrade

To run workflows remotely (e.g. in the cloud) or share artifacts outside your machine, you must configure your remote settings using the dstack config command:

dstack config

This command will ask you to choose an AWS profile (which will be used for AWS credentials), an AWS region (where workflows will be run), and an S3 bucket (to store remote artifacts and metadata).

AWS profile: default
AWS region: eu-west-1
S3 bucket: dstack-142421590066-eu-west-1
EC2 subnet: none

Example

Here's an example from dstack-examples.

workflows:
  # Saves the MNIST dataset as reusable artifact for other workflows
  - name: mnist-data
    provider: bash
    commands:
      - pip install -r mnist/requirements.txt
      - python mnist/download.py
    artifacts:
      # Saves the folder with the dataset as an artifact
      - path: ./data

  # Trains a model using the dataset from the `mnist-data` workflow
  - name: mnist-train
    provider: bash
    deps:
      # Depends on the artifacts from the `mnist-data` workflow
      - workflow: mnist-data
    commands:
      - pip install -r mnist/requirements.txt
      - python mnist/train.py
    artifacts:
      # Saves the `folder with logs and checkpoints as an artifact
      - path: ./lightning_logs

With workflows defined in this manner, dstack allows for effortless execution either locally or in a configured cloud account, while also enabling versioning and reuse of artifacts.

More information

For additional information and examples, see the following links:

Licence

Mozilla Public License 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dstack-0.1rc1.tar.gz (72.6 kB view details)

Uploaded Source

Built Distribution

dstack-0.1rc1-py3-none-any.whl (13.1 MB view details)

Uploaded Python 3

File details

Details for the file dstack-0.1rc1.tar.gz.

File metadata

  • Download URL: dstack-0.1rc1.tar.gz
  • Upload date:
  • Size: 72.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dstack-0.1rc1.tar.gz
Algorithm Hash digest
SHA256 d071d1c0a5b79ba57eec325e749c792f431f7d15eae599576e4d74573a0cc02a
MD5 1fc06a5acb3df3f8c908dfd2054a546a
BLAKE2b-256 4a0a1aba984a6666ea83553da77e105fa5eed6db4f2b2fe12f9c2d86c48cad5a

See more details on using hashes here.

File details

Details for the file dstack-0.1rc1-py3-none-any.whl.

File metadata

  • Download URL: dstack-0.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 13.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dstack-0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 411468eb84df1abe4a47e622f57049a063be3e796a375ea8b8992a80858748d4
MD5 de4ca9c69d418b092854c03aa5fc83f9
BLAKE2b-256 a592d8ff595477db35d7ea4bfcdafa1d31242ccb0e3433e07412d9f2b47b1e63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page