Skip to main content

Prefect worker for running flows on a Slurm HPC cluster

Project description

Prefect-Slurm

A Prefect worker for running flows on Slurm HPC clusters

Unit Tests License Python Prefect

Execute your Prefect flows on high-performance computing clusters using the Slurm workload manager. This worker seamlessly integrates with Slurm's REST API to submit, monitor, and manage flow runs as Slurm jobs.

Features

Automatic API Version Detection - Supports Slurm REST API versions 0.0.40-0.0.42 with automatic detection
🔒 Secure Token Management - JWT-based authentication with file locking and proper permissions
🔄 Zombie Job Recovery - Automatically detects and handles orphaned flow runs after worker restarts
📊 Resource Management - Full Slurm job specification support for CPU, memory, and time limits
🛠️ CLI Tools - Built-in utilities for token management and worker administration
🧪 Comprehensive Testing - Both unit and integration tests

Quick Start

Installation

pip install prefect-slurm

Basic Setup

  1. Create a work pool using the Slurm worker type:

    prefect work-pool create slurm-pool --type slurm
    
  2. Configure authentication - Set up your Slurm credentials:

    export PREFECT_SLURM_USER_NAME=your_username
    export PREFECT_SLURM_API_URL=http://your-slurm-server:6820
    
  3. Set up authentication token:

    # Generate and store token using built-in CLI
    scontrol token username=$USER lifespan=3600 | prefect-slurm token
    
    # Or set token directly via environment variable
    export PREFECT_SLURM_USER_TOKEN=your_jwt_token
    
  4. Start the worker:

    prefect worker start --pool slurm-pool --type slurm
    

Configuration

Environment Variables

Variable Description Default
PREFECT_SLURM_USER_NAME Slurm username Required
PREFECT_SLURM_API_URL Slurm REST API URL Required
PREFECT_SLURM_USER_TOKEN JWT authentication token Optional
PREFECT_SLURM_TOKEN_FILE Path to token file ~/.prefect_slurm.jwt
PREFECT_SLURM_LOCK_TIMEOUT File lock timeout (seconds) 60
PREFECT_SLURM_ENV_FILE Override environment file path Optional
PREFECT_SLURM_MAX_ATTEMPTS Max retries for requests to SLURM REST API 3
PREFECT_SLURM_RETRY_MIN_DELAY_SECONDS Min number of seconds between retry requests 10
PREFECT_SLURM_RETRY_MIN_DELAY_JITTER_SECONDS Min jitter to randomize delays 0
PREFECT_SLURM_RETRY_MAX_DELAY_JITTER_SECONDS Max jitter to randomize delays 20

Environment Files

The worker supports loading configuration from environment files using a hierarchical discovery system. Files are loaded in priority order (later files override earlier ones):

  1. System-wide: /etc/prefect-slurm/.env
  2. XDG Config: ~/.config/prefect-slurm/.env (or $XDG_CONFIG_HOME/prefect-slurm/.env)
  3. User Home: ~/.prefect_slurm.env
  4. Current Directory (app-specific): ./.prefect_slurm.env
  5. Current Directory: ./.env
  6. Environment Variable Override: $PREFECT_SLURM_ENV_FILE

Example environment file (.prefect_slurm.env):

# Slurm connection settings
PREFECT_SLURM_USER_NAME=your_username
PREFECT_SLURM_API_URL=http://your-slurm-server:6820

# Optional token (alternative to token file)
PREFECT_SLURM_USER_TOKEN=your_jwt_token_here

# Optional custom token file location
PREFECT_SLURM_TOKEN_FILE=~/my_custom_token.jwt

# Optional custom lock timeout
PREFECT_SLURM_LOCK_TIMEOUT=120

You can override the automatic discovery by setting PREFECT_SLURM_ENV_FILE to point to a specific file:

export PREFECT_SLURM_ENV_FILE=/path/to/my/custom.env
prefect worker start --pool slurm-pool --type slurm

Note: CLI commands (prefect-slurm token) also support environment files, though only PREFECT_SLURM_TOKEN_FILE and PREFECT_SLURM_LOCK_TIMEOUT are relevant for CLI operations.

Work Pool Configuration

Configure your Slurm work pool with job specifications:

job_configuration:
  partition: "compute"
  cpu: 4
  memory: 8
  time_limit: 2
  working_dir: "/path/to/working/directory"
  source_files:  # Optional - omit for default Python environment
    - "~/.bashrc"
    - "~/envs/conda/bin/activate"

Environment Setup

The worker supports two environment configuration modes:

Custom Environment (when source_files are specified):

job_configuration:
  source_files:
    - "~/.bashrc"
    - "/opt/conda/bin/activate"
    - "/opt/modules/init.sh"

The worker will source these files before executing your flow. Use this for conda environments, module systems, or custom shell configurations.

Default Python Environment (when source_files is empty or omitted):

job_configuration:
  partition: "compute"
  cpu: 4
  memory: 8

The worker automatically creates a temporary Python virtual environment with the matching Prefect version installed. The environment is created in $TMPDIR/.venv_$SLURM_JOB_ID and cleaned up after job completion.

CLI Tools

The package includes a command-line utility for token management:

# Store token from scontrol output at default location
scontrol token username=$USER lifespan=3600 | prefect-slurm token

# Store token to custom location
echo "jwt_token_here" | prefect-slurm token ~/my_token.jwt

# Get help
prefect-slurm token --help

The default location for the token is ~/.prefect_slurm.jwt (can be overridden by setting PREFECT_SLURM_TOKEN_FILE) and default permissions are 600 (read/write allowed for user only)

Running the Examples

You can test the examples in the examples/ directory using the local Docker Compose Slurm cluster:

  1. Start the local cluster:

    cd slurm_environment/
    docker-compose up -d
    
  2. Wait for services to be healthy (check with docker-compose ps)

  3. Deploy and run example flows (from the prefect_server container):

    # Enter the Prefect server container
    docker-compose exec prefect_server bash
    
    # Navigate to examples and deploy the hello world example interactively
    cd /opt/data/examples
    prefect deploy
    
    # Run the deployment
    prefect deployment run slurm-hello-world/slurm-hello-world-deployment
    
  4. Monitor execution:

    • Prefect UI: http://localhost:4200
    • Check Slurm jobs (from slurm_node container): docker-compose exec slurm_node squeue
    • View worker logs: docker-compose logs slurm_submitter

The Docker environment provides a complete Slurm cluster with the worker automatically configured and example flows ready to deploy.

Architecture

The Slurm worker integrates with Prefect's execution model:

  1. Worker Polling - Continuously polls Prefect API for scheduled flow runs
  2. Job Submission - Converts flow runs to Slurm job specifications
  3. Execution - Submits jobs via Slurm REST API with proper resource allocation
  4. Monitoring - Tracks job status and reports back to Prefect
  5. Cleanup - Handles zombie jobs and ensures proper flow run state management
graph TB
    A[Prefect Server] -->|polls for flows| B[Slurm Worker]
    B -->|submits jobs| C[Slurm REST API]
    C -->|schedules| D[Slurm Cluster]
    D -->|executes| E[Flow Run]
    E -->|reports status| B
    B -->|updates state| A

Requirements

  • Python: 3.11+ (< 3.14)
  • Prefect: 3.4.13+
  • Slurm: Cluster with REST API enabled (versions 0.0.40-0.0.42 supported)
  • Network: Access from worker node to both Prefect API and Slurm REST API

Development

Running Tests

# Unit tests only
pytest -m unit

# Integration tests (requires Docker)
pytest -m integration

# CLI tests
pytest -m cli

# All tests
pytest

Test Environment

The project includes Docker-based Slurm cluster for integration testing:

cd slurm_environment/
docker-compose up -d

Contributing

Contributions are welcome! This project is developed by the EBI Metagenomics team.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the full test suite
  5. Submit a pull request

License

Licensed under the Apache License 2.0. See LICENSE for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefect_slurm-0.1.5.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefect_slurm-0.1.5-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file prefect_slurm-0.1.5.tar.gz.

File metadata

  • Download URL: prefect_slurm-0.1.5.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for prefect_slurm-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1a5c1d6709adc23205494c7a9bc1195256260ddbd8507478aea2aca07e57e265
MD5 1aacd5eec14e9f62d3e30aab5f7a5de5
BLAKE2b-256 56747e2fdf2bd1896f1ed4572fb4b0635eea903d35a576d4c7d31afa3c0dca21

See more details on using hashes here.

File details

Details for the file prefect_slurm-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: prefect_slurm-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for prefect_slurm-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ad74075960b71f06da3aa232eb95741a10717995837ac1558f9fe7c4b11a78bf
MD5 213d4f34fcae07a555d4b93b1133c011
BLAKE2b-256 5e6095a8139b062fc908bcbb1b15780c1ff93f8393bd965caad16ee272c7af80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page