Client-side contention management for Slurm HPC clusters using CSMA/CA-inspired backoff
Project description
polite_submit
Client-side contention management for Slurm HPC clusters using CSMA/CA-inspired backoff.
Overview
polite_submit probes cluster state before job submission and backs off when resources are congested, improving queue health for all users without requiring scheduler modifications.
Key Features:
- Reduces queue congestion from batch job floods
- Zero server-side changes required (pure client)
- Drop-in replacement for
sbatch - Configurable politeness levels
- Supports batch and array job chunking
- Exponential backoff with jitter (like WiFi CSMA/CA)
Installation
pip install polite_submit
Or from source:
git clone https://github.com/ahb-sjsu/polite-submit
cd polite-submit
pip install -e .
Quick Start
# Single job
polite_submit job.sh
# Multiple scripts
polite_submit --batch job1.sh job2.sh job3.sh
# Array job in chunks
polite_submit --array sweep.sh --range 0-99 --chunk 10
# Dry run (see what would happen)
polite_submit --dry-run job.sh
# Skip politeness (late night, aggressive mode)
polite_submit --aggressive job.sh
How It Works
Before each submission, polite_submit:
- Probes cluster state via
sinfoandsqueue - Checks thresholds:
- Am I running too many jobs? (default: 4)
- Do I have too many pending? (default: 2)
- Are others waiting? (default: threshold 10)
- Is cluster utilization high? (default: 85%)
- If any threshold exceeded: Back off with exponential delay
- If clear: Submit via
sbatch
This mirrors CSMA/CA (Carrier-Sense Multiple Access with Collision Avoidance) from WiFi protocols.
Configuration
Create ~/.polite_submit.yaml or polite_submit.yaml in your working directory:
cluster:
host: hpc # SSH host alias (null for local)
partition: gpu # Default partition
politeness:
max_concurrent_jobs: 4 # Max running at once
max_pending_jobs: 2 # Max waiting in queue
queue_depth_threshold: 10 # Back off if this many others pending
utilization_threshold: 0.85 # Back off if cluster this full
peak_hours:
enabled: true
schedule:
- [9, 17] # 9 AM - 5 PM
max_concurrent: 2 # Stricter during peak
weekend_exempt: true
backoff:
initial_seconds: 30
max_seconds: 1800 # 30 minutes
multiplier: 2.0
max_attempts: 20
CLI Options
Usage: polite_submit [OPTIONS] [SCRIPT]
Options:
-b, --batch PATH Submit multiple scripts (can be repeated)
-a, --array PATH Submit as array job
--range TEXT Array range (e.g., 0-99). Required with --array
--chunk INTEGER Chunk size for array jobs
--aggressive Skip politeness checks
-n, --dry-run Show what would happen without submitting
-c, --config PATH Path to config file
-p, --partition Override partition
-H, --host TEXT SSH host for remote cluster
--version Show version
--help Show this message
SSH Setup
For remote clusters, configure SSH:
# ~/.ssh/config
Host hpc
HostName your-cluster.edu
User yourusername
IdentityFile ~/.ssh/id_ed25519
Then use:
polite_submit --host hpc job.sh
Theory: Fairness as Gauge Invariance
This tool implements voluntary compliance with fairness constraints. By limiting your own submissions when others are waiting, you preserve approximate user-permutation invariance—the principle that who you are shouldn't change your expected wait time.
For more on the theoretical foundation, see the ErisML library and the SQND framework.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polite_submit-0.1.0.tar.gz.
File metadata
- Download URL: polite_submit-0.1.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc30722e876005c3d10a12d13da51104197f05209232ff3427a8cd8e3fc64bd0
|
|
| MD5 |
b1000cebc5a4c79f4ad2c5b5af622758
|
|
| BLAKE2b-256 |
e80e1a14980a1bc2a5e507e6df1662cb91acc1029c7f24163b11df0392c39f9a
|
File details
Details for the file polite_submit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: polite_submit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccd3bfe3449227dda800689fcad8af8a43bbb9b5774f58d8d6a9a5ea9328e0c2
|
|
| MD5 |
cfecd454f99f964dc7bff4775adca6cc
|
|
| BLAKE2b-256 |
ac80a0c7a1dc55f3716e4ad3388d31bedb03eaa13a66f9b9a83215d38158a414
|