SLURM command emulator with time manipulation for testing periodic limits
Project description
SLURM Emulator - Time Travel Edition
A comprehensive SLURM command emulator with time manipulation capabilities for testing periodic limits and decay calculations.
Features
- ๐ฎ Interactive CLI - Full command-line interface with time travel
- โฐ Time Manipulation - Advance time by days, months, or quarters
- ๐พ Usage Injection - Add specific node-hour usage at any time point
- ๐ Decay Calculations - 15-day half-life fairshare decay simulation
- ๐ฏ QoS Management - Threshold-based QoS switching (normal โ slowdown โ blocked)
- ๐ Periodic Limits - Quarterly allocation with carryover logic
- ๐ฌ Scenario Runner - Complete SLURM_PERIODIC_LIMITS_SEQUENCE.md validation
- ๐ API Integration - REST API for waldur-site-agent integration
- ๐พ State Management - Checkpoint/restore functionality for testing
Quick Start
Installation
# Clone the repository
git clone https://github.com/waldur/slurm-emulator.git
cd slurm-emulator
# Install dependencies using uv
uv sync
Interactive CLI (CMD-based)
# Run with default configuration
uv run slurm-emulator
# Run with SLURM configuration file
uv run slurm-emulator --config examples/slurm.conf
# Advanced features work the same way
uv run slurm-emulator --config examples/custom_slurm.conf
# Validate configuration only
uv run slurm-emulator --validate-only --config /etc/slurm/slurm.conf
๐ฎ SLURM Emulator - Time Travel Edition (CMD Interface)
Type 'help' or '?' for commands. TAB for auto-completion.
Type 'help <command>' for detailed help on specific commands.
slurm-emulator> help
# Shows all available commands
slurm-emulator> help time_advance
# Shows detailed help for specific command
slurm-emulator> time_advance 2 months
โญ๏ธ Advanced 2 months
slurm-emulator> account_create test "Test Account" 1000
โ
Created account test with 1000Nh allocation
slurm-emulator> account create test-account "Test Account" 1000
โ
Created account test-account with 1000Nh allocation
slurm-emulator> usage inject user1 200 test-account
๐พ Injected 200.0Nh usage for user1 in test-account at 2024-01-01 00:00:00
slurm-emulator> time advance 2 months
โญ๏ธ Advanced 2 months
โฐ New time: 2024-03-01 00:00:00
slurm-emulator> usage inject user1 400 test-account
๐พ Injected 400.0Nh usage for user1 in test-account at 2024-03-01 00:00:00
slurm-emulator> limits calculate test-account
๐ Periodic Limits for test-account:
Period: 2024-Q1
Base allocation: 1000Nh
Total allocation: 1000.0Nh
Fairshare: 333
QoS threshold: 1200.0Nh
Grace limit: 1200.0Nh
Billing minutes: 60000
Complete Sequence Scenario
Run the full scenario from SLURM_PERIODIC_LIMITS_SEQUENCE.md:
slurm-emulator> scenario run sequence --interactive
๐ฌ Starting SLURM Periodic Limits Sequence Scenario
============================================================
โธ๏ธ Press Enter to execute Step 1: Initial Q1 setup...
๐ Step 1: Initial Q1 2024 Setup
Setting up 1000Nh quarterly allocation with 20% grace period
โ๏ธ Set fairshare to 333
๐ซ Set GrpTRESMins to 72000 billing-minutes
๐ฏ QoS threshold set to 1200.0Nh
๐พ Checkpoint 'initial_setup' created
# ... continues through all 9 steps of the sequence
Direct SLURM Commands
The emulator intercepts and emulates real SLURM commands:
slurm-emulator> sacctmgr add account test-account description="Test"
Adding Account(s)
test-account
Settings
Parent = root
Description = Test
slurm-emulator> sacctmgr modify account test-account set fairshare=333
Modified account...
test-account
Settings
fairshare=333
slurm-emulator> sacctmgr modify account test-account set GrpTRESMins=billing=72000
Modified account...
test-account
Settings
GrpTRESMins=billing=72000
slurm-emulator> sacct --accounts=test-account --starttime=2024-01-01 --endtime=2024-12-31
test-account|cpu=12800,mem=102400,gres/gpu=800|08:00:00|user1
API Integration
Start the API server for waldur-site-agent integration:
# From the slurm-emulator directory
uv run uvicorn emulator.api.emulator_server:app --host 0.0.0.0 --port 8080
API Endpoints
POST /api/apply-periodic-settings- Apply periodic limits settingsPOST /api/downscale-resource- Set QoS to slowdownPOST /api/restore-resource- Restore QoS to normalPOST /api/submit-report- Submit usage reportsGET /api/status- Get emulator statusPOST /api/time/advance- Advance emulator time
Example API Usage
# Apply periodic settings (from Waldur Mastermind)
curl -X POST http://localhost:8080/api/apply-periodic-settings \\
-H "Content-Type: application/json" \\
-d '{
"resource_id": "slurm_account_123",
"fairshare": 333,
"grp_tres_mins": {"billing": 72000},
"qos_threshold": {"billing": 1000}
}'
# Submit usage report (from site agent)
curl -X POST http://localhost:8080/api/submit-report \\
-H "Content-Type: application/json" \\
-d '{
"resource_id": "slurm_account_123",
"usage": {"billing": 167},
"billing_period": "2024-01-01",
"date": "2024-01-31T23:59:59Z",
"users": {
"user1": {"billing": 100},
"user2": {"billing": 67}
}
}'
# Advance time for testing
curl -X POST "http://localhost:8080/api/time/advance?months=3"
Waldur Site Agent Integration
Configure waldur-site-agent to use the emulator:
# waldur-site-agent-config.yaml
offerings:
- name: "SLURM HPC Cluster - Emulator"
backend_type: "slurm"
backend_settings:
# Enable emulator mode
emulator_mode: true
emulator_base_url: "http://localhost:8080"
# Override SLURM commands to use emulator
command_prefix: ["python", "/path/to/slurm-emulator/emulator/commands/dispatcher.py"]
# Periodic limits configuration
periodic_limits:
enabled: true
limit_type: "GrpTRESMins"
tres_billing_enabled: true
tres_billing_weights:
CPU: 0.015625
Mem: 0.001953125G
"GRES/gpu": 0.25
fairshare_decay_half_life: 15
api_endpoints:
apply_periodic_settings: "http://localhost:8080/api/apply-periodic-settings"
downscale_resource: "http://localhost:8080/api/downscale-resource"
restore_resource: "http://localhost:8080/api/restore-resource"
SLURM Configuration Support
The emulator now supports real SLURM configuration files to match actual deployment behavior:
Loading Configuration
# Use system SLURM configuration
uv run slurm-emulator --config /etc/slurm/slurm.conf
# Use custom configuration
uv run slurm-emulator --config examples/slurm.conf
# Validate configuration
uv run slurm-emulator --validate-only --config slurm.conf
Supported Configuration Parameters
The emulator parses and applies these SLURM configuration parameters:
Priority and Decay Settings:
PriorityDecayHalfLife- Fairshare decay half-life (e.g., "15-00:00:00")PriorityUsageResetPeriod- Usage reset period ("None" for manual reset)PriorityWeightFairShare- Fairshare weight for priority calculationsPriorityWeightQOS- QoS weight for priority calculationsFairShareDampeningFactor- Dampening factor for fairshare
TRES Billing:
TRESBillingWeights- Billing weights (e.g., "CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25")
Priority Flags:
PriorityFlags- Priority calculation flags (e.g., "MAX_TRES,NO_NORMAL_ASSOC")
Example Configuration
# SLURM Configuration
PriorityDecayHalfLife = 15-00:00:00
PriorityUsageResetPeriod = None # manual reset via sacctmgr RawUsage=0
PriorityWeightFairShare = 259200
PriorityWeightQOS = 500000
FairShareDampeningFactor = 3
TRESBillingWeights="CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25"
PriorityFlags=MAX_TRES,NO_NORMAL_ASSOC
Understanding Decay Calculations
The emulator implements SLURM's fairshare decay using the configured half-life:
# Decay formula matches SLURM's implementation
decay_factor = 2 ** (-days_elapsed / half_life_days)
# With default 15-day half-life, after 90 days (1 quarter):
decay_factor = 2 ** (-90 / 15) = 0.0156 (1.56%)
# With 7-day half-life, after 90 days:
decay_factor = 2 ** (-90 / 7) = 0.000135 (0.01%)
Example with 15-day half-life: User consumes 2000 hours in Q1. After Q1 ends (90 days later):
- Original impact: 2000 hours
- Decayed impact: 2000 ร 0.0156 = 31 hours equivalent
- Q2 allocation: 1000 + (1000 - 31) = 1969 hours available
Key Commands Reference
Time Manipulation
time_show # Show current time and period
time_advance <amount> <unit> # Advance time (units: days, months, quarters)
time_set YYYY-MM-DD [HH:MM:SS] # Set specific date/time
# Examples:
time_advance 2 months
time_advance 30 days
time_set 2024-05-20
Usage Simulation
usage_inject <user> <amount> [account] # Inject node-hour usage
usage_show [account] [period] # Show usage summary with user breakdown
# Examples:
usage_inject user1 200 test-account
usage_show test-account
usage_show test-account 2024-Q1
Account Management
account_create <name> [description] [allocation] # Create account
account_list # List all accounts with status
account_show <name> # Show detailed account info
account_delete <name> # Delete account
# Examples:
account_create test "Test Account" 1000
account_show test
account_list
QoS Management
qos_show [account] # Show QoS status and details
qos_set <account> <qos> # Set QoS level (normal/slowdown/blocked)
qos_check [account] # Check thresholds and auto-update QoS
# Examples:
qos_check test-account
qos_set test-account slowdown
qos_show test-account
Limits Calculation
limits_calculate [account] # Calculate and display periodic limits
# Example:
limits_calculate test-account
Scenario Management
scenario_list [type] # List scenarios (optionally filter by type)
scenario_describe <name> # Show detailed description and learning objectives
scenario_steps <name> # Show step-by-step command breakdown
scenario_run <name> # Run scenario automatically
scenario_run <name> --interactive # Run with confirmation prompts
scenario_run <name> --step-by-step # Run with detailed step output
scenario_search <query> # Search scenarios by keyword
# Examples:
scenario_list qos_management
scenario_describe qos_thresholds
scenario_run qos_thresholds --step-by-step
scenario_search decay
Configuration Management
config_show # Show current SLURM configuration
config_reload <path> # Hot-reload configuration file
# Examples:
config_show
config_reload examples/slurm.conf
State Management
cleanup_all # Clean all accounts and reset to fresh state
cleanup_scenario <name> # Clean specific scenario accounts
cleanup_account <name> # Clean specific account completely
# Examples:
cleanup_all
cleanup_scenario qos_thresholds
cleanup_account test-account
SLURM Commands
sacctmgr <args> # Run sacctmgr command
sacct <args> # Run sacct command
sinfo <args> # Run sinfo command
# Examples:
sacctmgr list accounts
sacctmgr modify account test set fairshare=333
sacct --accounts=test --format=Account,User,Elapsed
Testing Scenarios
Basic Usage Pattern
# Setup with specific configuration
uv run slurm-emulator --config examples/slurm.conf
# In emulator CLI:
time set 2024-01-01
account create test-account "Test" 1000
# Month 1: Light usage
usage inject user1 100 test-account
time advance 1 months
# Month 2: Heavy usage
usage inject user1 600 test-account
limits calculate test-account
qos check test-account
# Quarter transition
time advance 1 months
limits apply test-account
Configuration Testing
# Test different decay rates
uv run slurm-emulator --config examples/custom_slurm.conf
# Compare configurations
uv run slurm-emulator --validate-only --config examples/slurm.conf
uv run slurm-emulator --validate-only --config examples/custom_slurm.conf
Decay Validation
# Q1: Heavy usage
time set 2024-01-01
account create test-account "Test" 1000
usage inject user1 1500 test-account
# Q2: Check decay impact
time set 2024-04-01
limits calculate test-account
# Should show ~23Nh effective previous usage (1500 * 0.0156)
QoS Threshold Testing
# Setup with 1000Nh allocation (1200Nh threshold with 20% grace)
account create test-account "Test" 1000
qos show test-account # Should show "normal"
usage inject user1 1100 test-account
qos check test-account # Should show approaching threshold
usage inject user1 200 test-account # Total: 1300Nh
qos check test-account # Should trigger slowdown QoS
Architecture
slurm-emulator/
โโโ emulator/
โ โโโ core/
โ โ โโโ time_engine.py # Time manipulation
โ โ โโโ database.py # In-memory state
โ โ โโโ slurm_config.py # SLURM config parsing
โ โ โโโ usage_simulator.py # Usage injection
โ โโโ commands/
โ โ โโโ sacctmgr.py # sacctmgr emulator
โ โ โโโ sacct.py # sacct emulator
โ โ โโโ dispatcher.py # Command routing
โ โโโ periodic_limits/
โ โ โโโ calculator.py # Decay & carryover
โ โ โโโ qos_manager.py # QoS management
โ โโโ scenarios/
โ โ โโโ sequence_scenario.py # Complete scenario
โ โ โโโ scenario_registry.py # Scenario discovery & running
โ โ โโโ limits_configuration_scenarios.py
โ โโโ cli/
โ โ โโโ main.py # Interactive CLI
โ โ โโโ cmd_cli.py # CMD-based CLI
โ โโโ api/
โ โโโ emulator_server.py # REST API
โโโ scripts/
โ โโโ release.py # Release management
โ โโโ changelog.sh # Changelog generation
โ โโโ generate_changelog_data.py # Commit data collection
โ โโโ prompts/
โ โโโ changelog-prompt.md # Changelog prompt template
โโโ tests/ # Test suites
Development
Running Tests
uv run pytest
Releasing
# Full release: update version, generate changelog, tag, push
uv run scripts/release.py release X.Y.Z
# Skip changelog generation
uv run scripts/release.py release X.Y.Z --skip-changelog
Pushing the tag triggers GitHub Actions for testing and PyPI publishing.
Adding New Scenarios
# Create new scenario class
class CustomScenario:
def __init__(self, time_engine, database):
self.time_engine = time_engine
self.database = database
def run_scenario(self):
# Implement scenario steps
pass
Extending Commands
# Add new SLURM command support
class NewCommandEmulator:
def handle_command(self, args):
# Implement command logic
return "command output"
Troubleshooting
State Persistence
Emulator state is saved to:
/tmp/slurm_emulator_time.json- Current time/tmp/slurm_emulator_db.json- Database state
Common Issues
"Account not found": Create account first with account create
"No usage records": Inject usage with usage inject
"Time not advancing": Check time with time command
"API connection failed": Ensure server is running on port 8080
Reset Emulator
rm /tmp/slurm_emulator_*.json
slurm-emulator
# Start fresh
License
MIT License - See LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slurm_emulator-0.3.0.tar.gz.
File metadata
- Download URL: slurm_emulator-0.3.0.tar.gz
- Upload date:
- Size: 74.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"11","id":"bullseye","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac29a74270861419726cfaec36f076475a61f36a8e5af2555de524373990c95b
|
|
| MD5 |
3d3012395af795439d7383fcf122e6e9
|
|
| BLAKE2b-256 |
d4c25aac976e3f5d55bbe51b31d24981db7b561842c4a6ee9c736f460a795c5d
|
File details
Details for the file slurm_emulator-0.3.0-py3-none-any.whl.
File metadata
- Download URL: slurm_emulator-0.3.0-py3-none-any.whl
- Upload date:
- Size: 84.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"11","id":"bullseye","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8a9ede13cc2d0e67b1eaa7a6a535b538d86669d542a5163bd3a35c1746a1443
|
|
| MD5 |
3a0b9303781ded6a5b820ac0b7d4cfbc
|
|
| BLAKE2b-256 |
65ecd4628c66516589e7ca7b4c19632b33066f65a744e1ceff8846e4a2784fbd
|