Skip to main content

Submit and monitor SLURM jobs via SSH

Project description

SSH SLURM Client

A Python library and CLI tool for submitting and monitoring SLURM jobs on DGX servers via SSH.

Features

  • SLURM job submission via SSH connections
  • Automatic job ID extraction
  • Periodic job status monitoring
  • Job log output retrieval
  • File handling:
    • Local files: Automatically uploaded to server's temporary folder (/tmp/ssh-slurm) and executed
    • Remote files: Direct execution of existing files on server (specified with absolute path)
  • Connection management:
    • .ssh/config support
    • Custom profile management (SSH config hosts or direct connection info)
  • Automatic environment variable transfer:
    • Auto-detection and transfer of common environment variables like HF_TOKEN, WANDB_API_KEY, etc.
  • Available as both CLI tool and Python library

Installation

pip install -e .

Usage

As CLI Tool

Basic Usage

# Execute local script file
ssb my_script.sh --host dgx1

# Execute remote script file
ssb /home/user/scripts/remote_script.sh --host dgx1

# Use profile
ssb my_script.sh --profile dgx1

# Use SSH config settings
ssb my_script.sh --host my-dgx-server

# Direct specification
ssb my_script.sh --hostname dgx.example.com --username user --key-file ~/.ssh/id_rsa

Detailed Options

# Verbose logging, job name specification, monitoring interval
ssb script.sh --host dgx1 --job-name my_job --poll-interval 5 --verbose

# Don't delete uploaded files
ssb script.sh --host dgx1 --no-cleanup

# Submit job without monitoring
ssb script.sh --host dgx1 --no-monitor

# Environment variables are automatically detected and transferred (HF_TOKEN, WANDB_API_KEY, etc.)
ssb script.sh --host dgx1

# Additionally pass local environment variables
ssb script.sh --host dgx1 --env-local CUSTOM_TOKEN

# Set custom environment variables
ssb script.sh --host dgx1 --env "CUSTOM_VAR=value" --env "DEBUG=true"

# Combined usage
ssb script.sh --host dgx1 --env-local CUSTOM_TOKEN --env "MODEL_NAME=llama3" --verbose

Profile Management

Adding Profiles

Using SSH config host:

# Specify host configured in SSH config
ssb profile add dgx1 --ssh-host my-dgx-host --description "DGX-1 server"

Specifying direct connection info:

ssb profile add dgx1 --hostname dgx1.example.com --username user --key-file ~/.ssh/id_rsa --description "DGX-1 server"
ssb profile add dgx2 --hostname dgx2.example.com --username user --key-file ~/.ssh/id_rsa --port 2222 --description "DGX-2 server"

Profile List and Management

# List profiles
ssb profile list

# Set current profile
ssb profile set dgx1

# Show profile details
ssb profile show dgx1

# Update profile
ssb profile update dgx1 --hostname new-dgx1.example.com

# Change to SSH config host
ssb profile update dgx1 --ssh-host my-dgx-host

# Remove profile
ssb profile remove dgx1

Using SSH Config

You can utilize settings described in ~/.ssh/config:

Host dgx1
    HostName dgx1.example.com
    User username
    Port 22
    IdentityFile ~/.ssh/id_rsa

Host dgx-a100
    HostName 192.168.1.100
    User gpu_user
    Port 2222
    IdentityFile ~/.ssh/dgx_key

Usage examples:

ssb my_script.sh --host dgx1
ssb my_script.sh --host dgx-a100

As Python Library

from ssh_slurm import SSHSlurmClient
from ssh_slurm.config import ConfigManager
from ssh_slurm.ssh_config import get_ssh_config_host

# Get settings from SSH config
ssh_host = get_ssh_config_host("dgx1")

# Create client
with SSHSlurmClient(
    hostname=ssh_host.effective_hostname,
    username=ssh_host.effective_user,
    key_filename=ssh_host.effective_identity_file,
    port=ssh_host.effective_port
) as client:
    
    # Submit local file
    job = client.submit_sbatch_file("./my_script.sh", job_name="test_job")
    
    # Or submit remote file
    job = client.submit_sbatch_file("/home/user/remote_script.sh", job_name="remote_job")
    
    if job:
        print(f"Job submitted: {job.job_id}")
        if job.is_local_script:
            print(f"Script uploaded to: {job.script_path}")
        
        # Monitor job
        job = client.monitor_job(job, poll_interval=10)
        
        # Get results
        stdout, stderr = client.get_job_output(job.job_id)
        print(f"Output: {stdout}")
        
        # Cleanup
        client.cleanup_job_files(job)

File Handling

Local Files

  • Specified with relative or absolute path (not starting with /)
  • Automatically uploaded to server's /tmp/ssh-slurm/
  • Executable permissions automatically granted (.sh, .py, .pl, .r files)
  • Automatically deleted after job completion (can be disabled with --no-cleanup)

Remote Files

  • Specified with absolute path (starting with /)
  • Direct execution of existing files on server
  • File existence verification performed

Configuration Files

Profile Settings (~/.config/ssh-slurm.json)

{
  "current_profile": "dgx1",
  "profiles": {
    "dgx1": {
      "hostname": "dgx1.example.com",
      "username": "user",
      "key_filename": "/home/user/.ssh/id_rsa",
      "port": 22,
      "description": "DGX-1 server",
      "ssh_host": null
    },
    "dgx2": {
      "hostname": "dgx2.internal.com",
      "username": "gpuuser",
      "key_filename": "/home/user/.ssh/dgx_key",
      "port": 22,
      "description": "DGX-2 via SSH config",
      "ssh_host": "dgx2-internal"
    }
  }
}

SSH Config (~/.ssh/config)

Supports standard SSH configuration files:

Host pattern
    HostName hostname
    User username
    Port port
    IdentityFile ~/.ssh/key_file
    ProxyJump jump_host
    # Other SSH settings

Security

  • Passwords are not stored in configuration files
  • Only SSH private key file authentication is supported
  • Uploaded files are temporarily stored on server and deleted after completion

Command Reference

ssb (Job Execution)

ssb <script_path> [options]

Connection options:
  --host, -H          SSH host from .ssh/config
  --profile, -p       Use saved profile  
  --hostname          DGX server hostname
  --username          SSH username
  --key-file          SSH private key file path
  --port              SSH port (default: 22)
  --ssh-config        SSH config file path (default: ~/.ssh/config)

Job options:
  --job-name          Job name
  --poll-interval     Job status polling interval in seconds (default: 10)
  --timeout           Job monitoring timeout in seconds
  --no-monitor        Submit job without monitoring
  --no-cleanup        Do not cleanup uploaded script files

Environment options:
  --env KEY=VALUE     Pass environment variable to remote job (can be used multiple times)
  --env-local KEY     Pass local environment variable to remote job (can be used multiple times)
  
  Note: Common environment variables (HF_TOKEN, WANDB_API_KEY, etc.) are automatically detected and transferred

Other options:
  --verbose, -v       Enable verbose logging

ssb profile (Profile Management)

ssb profile <command> [options]

Commands:
  add <name>        - Add a new profile
    --ssh-host      Use SSH config host
    --hostname      Direct hostname (requires --username, --key-file)
    --username      SSH username
    --key-file      SSH key file path
    --port          SSH port (default: 22)
    --description   Profile description
    
  list              - List all profiles
  set <name>        - Set current profile
  show [name]       - Show profile details
  update <name>     - Update a profile
  remove <name>     - Remove a profile

Usage Examples

# Use SSH config host
ssb train.sh --host dgx1

# Use profile
ssb train.sh --profile my-dgx

# Execute remote script
ssb /shared/scripts/training.sh --host dgx1

# Detailed settings
ssb local_script.sh --host dgx1 --job-name training_job --poll-interval 30 --verbose

Dependencies

  • Python 3.13+
  • paramiko 4.0.0+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ssh_slurm-0.1.0.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ssh_slurm-0.1.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file ssh_slurm-0.1.0.tar.gz.

File metadata

  • Download URL: ssh_slurm-0.1.0.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ssh_slurm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3ad589cd846da41a05116fe4254741862233c2523f5283b4d0c827515d8dc369
MD5 9fce0c5bbe812d00b3f612c2b9b97842
BLAKE2b-256 5a0aad4244e66b8b2b19dfc4caa5360f639807594f6f1eccb8b7c0f17a56a8d4

See more details on using hashes here.

File details

Details for the file ssh_slurm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ssh_slurm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ssh_slurm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05bb7ed538a26f16b1c04a7cf89b34cee08aef4b2f9ae260e0813adb549ffccb
MD5 1b4296ce9d0c321bc6c0484ff83d4a2b
BLAKE2b-256 06d2a4966b33850fc35cd7de1ead02cf42305c5af7a068acdc0499202193eba3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page