Skip to main content

Slurm extension for Metaflow

Project description

SLURM extension for Metaflow

This extension adds support for executing steps in Metaflow Flows on SLURM clusters.

Basic Usage

  • Have a SLURM cluster that you have public access for.
    • This includes the username, the IP address and the PEM file (at minimum)
  • Simply add the @slurm decorator to the step you want to run on the SLURM cluster.
@slurm(
    username="ubuntu",
    address="A.B.C.D",
    ssh_key_file="~/path/to/ssh/pem/file.pem"
)

Note that the above parameters can also be configured via the following environment variables:

  • METAFLOW_SLURM_USERNAME
  • METAFLOW_SLURM_ADDRESS
  • METAFLOW_SLURM_SSH_KEY_FILE

The step that is decorated with @slurm will create the following directory structure on the cluster.

metaflow/
├── assets
│   └── madhurMovies218892mid13433160
│       └── metaflow
│           ├── INFO
│           ├── demo.py
│           ├── job.tar
│           ├── linux-64
│           ├── metaflow
│           ├── metaflow_extensions
│           └── micromamba
├── madhurMovies218892mid13433160.sh
├── stderr
│   └── madhurMovies218892mid13433160.stderr
└── stdout
    └── madhurMovies218892mid13433160.stdout

In the above output, demo.py was the name of our flow file.

One can pass cleanup=True in the decorator to clear up the contents of the assets folder. This clears up all the artifacts created by Metaflow.

Using cleanup=True will not delete:

  • stdout folder
  • stderr folder
  • the generated shell script i.e. madhurMovies218892mid13433160.sh

This is useful for debugging later and may be manually deleted by logging into the slurm cluster.

Supplying Credentials

Credentials need to be supplied to be able to download the code package. They can:

  • either exist on the Slurm cluster itself, i.e. compute instances have access to the blob store
  • supplied via the @environment decorator
@environment(vars={
    "AWS_ACCESS_KEY_ID": "XXXX",
    "AWS_SECRET_ACCESS_KEY": "YYYY"
})

Note that this will expose the credentials in the shell script that is generated i.e.

madhurMovies218892mid13433160.sh will have the following contents present:

export AWS_ACCESS_KEY_ID='XXXX'
export AWS_SECRET_ACCESS_KEY='YYYY'
  • hydrating environment variables with the @secrets decorator from a secret manager.

PS -- If you are on the Outerbounds platform, the auth is taken care of and there is no need to fiddle with it.

Things to be taken care of

  • The extension runs workloads via shell scripts and sbatch in a linux native environment
    • i.e. the workloads are NOT run inside docker containers
    • As such, the compute instances should not have python2 installed and both python and python3 should refer to a python version above 3.8 preferrably.

Fin.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_slurm-0.0.1.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

metaflow_slurm-0.0.1-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file metaflow_slurm-0.0.1.tar.gz.

File metadata

  • Download URL: metaflow_slurm-0.0.1.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for metaflow_slurm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 65e074233b39a8ca4f24fb318835a8961c55841b791f035886d430a4a1e69cb8
MD5 e6199509f7ed7f71366c90a1d77a54fb
BLAKE2b-256 ea4cc9247c4308a8cfcdaee71f431519953fcee89f2ddd6fe3f88f0fd2925efd

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_slurm-0.0.1.tar.gz:

Publisher: publish.yml on outerbounds/metaflow-slurm

Attestations:

File details

Details for the file metaflow_slurm-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for metaflow_slurm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6b75adf9a58350030af30196780001d17810e27d787e71fab2d8c612acf15519
MD5 8366a1cfa5b37911a9b19d1a0cb4fd7e
BLAKE2b-256 84db27338af8a19ff91acbe724d1fb786f892891440d62b75901b548fbd16df5

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_slurm-0.0.1-py3-none-any.whl:

Publisher: publish.yml on outerbounds/metaflow-slurm

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page