dagster integration with slurm
Project description
dagster-slurm
Integrating dagster to orchestrate slurm jobs for HPC systems and frameworks for scaling compute like ray for a better developer experience on supercomputers.
dagster-slurm lets you take the same Dagster assets from a laptop to a Slurm-backed supercomputer with minimal configuration changes.
An European sovereign GPU cloud does not come out of nowhere maybe this project can support making HPC systems more accessible.
Basic example
https://github.com/ascii-supply-networks/dagster-slurm/tree/main/examples
prerequisites
- installation of pixi: https://pixi.sh/latest/installation/
curl -fsSL https://pixi.sh/install.sh | sh pixi global install git- a container runtime like docker or podman; for now we assume
docker composeis available to you. You could absolutely also usenerdctlor something similar.
usage
Example
git clone https://github.com/ascii-supply-networks/dagster-slurm.git
docker compose up -d --build
cd dagster-slurm/examples
local execution
Execute without slurm.
- Small data
- Rapid local prototyping
pixi run start
go to http://localhost:3000 and you should see the dagster webserver running.
docker local execution
- Test everything works on SLURM
- Still small data
- Mainly used for developing this integration
Ensure you have a .env file with the following content:
SLURM_EDGE_NODE_HOST=localhost
SLURM_EDGE_NODE_PORT=2223
SLURM_EDGE_NODE_USER=submitter
SLURM_EDGE_NODE_PASSWORD=submitter
SLURM_DEPLOYMENT_BASE_PATH=/home/submitter/pipelines/deployments
pixi run start-staging
go to http://localhost:3000 and you should see the dagster webserver running.
prod docker local execution
- Test everything works on SLURM
- Still small data
- Mainly used for developing this integration
- This target instead supports a faster startup of the job
Ensure you have a .env file with the following content:
SLURM_EDGE_NODE_HOST=localhost
SLURM_EDGE_NODE_PORT=2223
SLURM_EDGE_NODE_USER=submitter
SLURM_EDGE_NODE_PASSWORD=submitter
SLURM_DEPLOYMENT_BASE_PATH=/home/submitter/pipelines/deployments
# see the JQ command below for dynamically setting this
# DAGSTER_PROD_ENV_PATH=/home/submitter/pipelines/deployments/<<<your deployment >>>
# we assume your CI-CD pipelines would out of band perform the deployment of the environment
# this allows your jobs to start up faster
pixi run deploy-prod-docker
cat deplyyment_metadata.json
export DAGSTER_PROD_ENV_PATH="$(jq -er '.deployment_path' foo.json)"
pixi run start-prod-docker
go to http://localhost:3000 and you should see the dagster webserver running.
real HPC supercomputer execution
- Targets clusters like VSC-5 (Austrian Scientific Computing (ASC)) and Leonardo (CINECA).
- Assets run against the real scheduler, so ensure the account has queue access and quotas.
Create a .env file with the edge-node credentials and select the site profile:
# example for VSC-5
SLURM_EDGE_NODE_HOST=vsc5.vsc.ac.at
SLURM_EDGE_NODE_PORT=22
SLURM_EDGE_NODE_USER=<<your_user>>
SLURM_EDGE_NODE_PASSWORD=<<your_password>>
SLURM_EDGE_NODE_JUMP_HOST=vmos.vsc.ac.at
SLURM_EDGE_NODE_JUMP_USER=<<your_user>>
SLURM_EDGE_NODE_JUMP_PASSWORD=<<your_password>>
SLURM_DEPLOYMENT_BASE_PATH=/home/<<your_user>>/pipelines/deployments
SLURM_PARTITION=zen3_0512
SLURM_QOS=zen3_0512_devel
SLURM_RESERVATION=dagster-slurm_21
SLURM_SUPERCOMPUTER_SITE=vsc5
DAGSTER_DEPLOYMENT=staging_supercomputer
If your account relies on passwords (or passwords + OTP), provide them for both the jump host and the final login node. The automation will answer the standard prompts; any time-based OTP still has to be supplied interactively once per validity window. When an extra prompt appears, Dagster writes Enter ... for <host>: to your terminal (via /dev/tty). Enter the code there to continue.
TTY allocation is handled automatically for password-based sessions, so you do not need to set SLURM_EDGE_NODE_FORCE_TTY unless your centre requires it explicitly.
With the variables in place, validate connectivity and job submission using the staging supercomputer profile:
pixi run start-staging-supercomputer
Staging mode packages dependencies on demand. Expect the first asset run to upload a new environment bundle before dispatching the Slurm job.
For production you should pre-build and upload the execution environment via your CI/CD pipeline (see examples/scripts/deploy_environment.py). Capture the output path and expose it to Dagster as CI_DEPLOYED_ENVIRONMENT_PATH:
python scripts/deploy_environment.py --platform linux-64 # run from CI
# -> produces deployment_metadata.json with "deployment_path"
export CI_DEPLOYED_ENVIRONMENT_PATH=/home/submitter/pipelines/deployments/prod-env-20251018
export DAGSTER_DEPLOYMENT=production_supercomputer
pixi run start-production-supercomputer
If CI_DEPLOYED_ENVIRONMENT_PATH is missing, the production profile will refuse to start to prevent accidental live builds on the cluster.
To confirm a submission landed on the expected queue, run:
ssh -J <<your_user>>@vmos.vsc.ac.at <<your_user>>@vsc5.vsc.ac.at \
"squeue -j <jobid> -o '%i %P %q %R %T'"
The Partition, QOS, and Reservation columns should match your .env.
contributing
See Details here: docs for how to contribute! Help building and maintaining this project is welcome.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dagster_slurm-1.7.0.tar.gz.
File metadata
- Download URL: dagster_slurm-1.7.0.tar.gz
- Upload date:
- Size: 55.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f4ac7f6653a1f8f3f6b43c4e215641a249f28894e1a0cd3d296d13fc61d65b2
|
|
| MD5 |
f7ed5d87d1f457f84b7d9c8050f2d148
|
|
| BLAKE2b-256 |
641f0a0aa802684ac7237e4c2272210ab7922e96174f99e012f2575a05af59ee
|
Provenance
The following attestation bundles were made for dagster_slurm-1.7.0.tar.gz:
Publisher:
library.yaml on ascii-supply-networks/dagster-slurm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dagster_slurm-1.7.0.tar.gz -
Subject digest:
6f4ac7f6653a1f8f3f6b43c4e215641a249f28894e1a0cd3d296d13fc61d65b2 - Sigstore transparency entry: 633366487
- Sigstore integration time:
-
Permalink:
ascii-supply-networks/dagster-slurm@d9e39a036d4a542da47066d474cb5686c7c108ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ascii-supply-networks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
library.yaml@d9e39a036d4a542da47066d474cb5686c7c108ef -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dagster_slurm-1.7.0-py3-none-any.whl.
File metadata
- Download URL: dagster_slurm-1.7.0-py3-none-any.whl
- Upload date:
- Size: 69.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c7b3be34f6ccc0b626c04ed3e4ea5392ece1aed36ef3b78f347e46f4eec8037
|
|
| MD5 |
3005426685b855e15d577716f6bf526a
|
|
| BLAKE2b-256 |
b9ec8a20c66914770f9e45831f3d6c2aaf24d393ab966fa2c53c142ed356d662
|
Provenance
The following attestation bundles were made for dagster_slurm-1.7.0-py3-none-any.whl:
Publisher:
library.yaml on ascii-supply-networks/dagster-slurm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dagster_slurm-1.7.0-py3-none-any.whl -
Subject digest:
7c7b3be34f6ccc0b626c04ed3e4ea5392ece1aed36ef3b78f347e46f4eec8037 - Sigstore transparency entry: 633366505
- Sigstore integration time:
-
Permalink:
ascii-supply-networks/dagster-slurm@d9e39a036d4a542da47066d474cb5686c7c108ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ascii-supply-networks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
library.yaml@d9e39a036d4a542da47066d474cb5686c7c108ef -
Trigger Event:
workflow_dispatch
-
Statement type: