Skip to main content

SLURM command emulator with time manipulation for testing periodic limits

Project description

SLURM Emulator - Time Travel Edition

A comprehensive SLURM command emulator with time manipulation capabilities for testing periodic limits and decay calculations.

Features

  • ๐ŸŽฎ Interactive CLI - Full command-line interface with time travel
  • โฐ Time Manipulation - Advance time by days, months, or quarters
  • ๐Ÿ’พ Usage Injection - Add specific node-hour usage at any time point
  • ๐Ÿ”„ Decay Calculations - 15-day half-life fairshare decay simulation
  • ๐ŸŽฏ QoS Management - Threshold-based QoS switching (normal โ†’ slowdown โ†’ blocked)
  • ๐Ÿ“Š Periodic Limits - Quarterly allocation with carryover logic
  • ๐ŸŽฌ Scenario Runner - Complete SLURM_PERIODIC_LIMITS_SEQUENCE.md validation
  • ๐Ÿ”Œ API Integration - REST API for waldur-site-agent integration
  • ๐Ÿ’พ State Management - Checkpoint/restore functionality for testing

Quick Start

Installation

# Clone the repository
git clone https://github.com/waldur/slurm-emulator.git
cd slurm-emulator

# Install dependencies using uv
uv sync

Interactive CLI (CMD-based)

# Run with default configuration
uv run slurm-emulator

# Run with SLURM configuration file
uv run slurm-emulator --config examples/slurm.conf

# Advanced features work the same way
uv run slurm-emulator --config examples/custom_slurm.conf

# Validate configuration only
uv run slurm-emulator --validate-only --config /etc/slurm/slurm.conf

๐ŸŽฎ SLURM Emulator - Time Travel Edition (CMD Interface)
Type 'help' or '?' for commands. TAB for auto-completion.
Type 'help <command>' for detailed help on specific commands.

slurm-emulator> help
# Shows all available commands

slurm-emulator> help time_advance
# Shows detailed help for specific command

slurm-emulator> time_advance 2 months
โญ๏ธ  Advanced 2 months

slurm-emulator> account_create test "Test Account" 1000
โœ… Created account test with 1000Nh allocation

slurm-emulator> account create test-account "Test Account" 1000
โœ… Created account test-account with 1000Nh allocation

slurm-emulator> usage inject user1 200 test-account
๐Ÿ’พ Injected 200.0Nh usage for user1 in test-account at 2024-01-01 00:00:00

slurm-emulator> time advance 2 months
โญ๏ธ  Advanced 2 months
โฐ New time: 2024-03-01 00:00:00

slurm-emulator> usage inject user1 400 test-account
๐Ÿ’พ Injected 400.0Nh usage for user1 in test-account at 2024-03-01 00:00:00

slurm-emulator> limits calculate test-account
๐Ÿ“Š Periodic Limits for test-account:
   Period: 2024-Q1
   Base allocation: 1000Nh
   Total allocation: 1000.0Nh
   Fairshare: 333
   QoS threshold: 1200.0Nh
   Grace limit: 1200.0Nh
   Billing minutes: 60000

Complete Sequence Scenario

Run the full scenario from SLURM_PERIODIC_LIMITS_SEQUENCE.md:

slurm-emulator> scenario run sequence --interactive

๐ŸŽฌ Starting SLURM Periodic Limits Sequence Scenario
============================================================

โธ๏ธ  Press Enter to execute Step 1: Initial Q1 setup...

๐Ÿ“ Step 1: Initial Q1 2024 Setup
   Setting up 1000Nh quarterly allocation with 20% grace period
   โš–๏ธ  Set fairshare to 333
   ๐Ÿšซ Set GrpTRESMins to 72000 billing-minutes
   ๐ŸŽฏ QoS threshold set to 1200.0Nh
   ๐Ÿ’พ Checkpoint 'initial_setup' created

# ... continues through all 9 steps of the sequence

Direct SLURM Commands

The emulator intercepts and emulates real SLURM commands:

slurm-emulator> sacctmgr add account test-account description="Test"
 Adding Account(s)
  test-account
 Settings
  Parent     = root
  Description = Test

slurm-emulator> sacctmgr modify account test-account set fairshare=333
 Modified account...
  test-account
 Settings
  fairshare=333

slurm-emulator> sacctmgr modify account test-account set GrpTRESMins=billing=72000
 Modified account...
  test-account
 Settings
  GrpTRESMins=billing=72000

slurm-emulator> sacct --accounts=test-account --starttime=2024-01-01 --endtime=2024-12-31
test-account|cpu=12800,mem=102400,gres/gpu=800|08:00:00|user1

API Integration

Start the API server for waldur-site-agent integration:

# From the slurm-emulator directory
uv run uvicorn emulator.api.emulator_server:app --host 0.0.0.0 --port 8080

API Endpoints

  • POST /api/apply-periodic-settings - Apply periodic limits settings
  • POST /api/downscale-resource - Set QoS to slowdown
  • POST /api/restore-resource - Restore QoS to normal
  • POST /api/submit-report - Submit usage reports
  • GET /api/status - Get emulator status
  • POST /api/time/advance - Advance emulator time

Example API Usage

# Apply periodic settings (from Waldur Mastermind)
curl -X POST http://localhost:8080/api/apply-periodic-settings \\
  -H "Content-Type: application/json" \\
  -d '{
    "resource_id": "slurm_account_123",
    "fairshare": 333,
    "grp_tres_mins": {"billing": 72000},
    "qos_threshold": {"billing": 1000}
  }'

# Submit usage report (from site agent)
curl -X POST http://localhost:8080/api/submit-report \\
  -H "Content-Type: application/json" \\
  -d '{
    "resource_id": "slurm_account_123",
    "usage": {"billing": 167},
    "billing_period": "2024-01-01",
    "date": "2024-01-31T23:59:59Z",
    "users": {
      "user1": {"billing": 100},
      "user2": {"billing": 67}
    }
  }'

# Advance time for testing
curl -X POST "http://localhost:8080/api/time/advance?months=3"

Waldur Site Agent Integration

Configure waldur-site-agent to use the emulator:

# waldur-site-agent-config.yaml
offerings:
  - name: "SLURM HPC Cluster - Emulator"
    backend_type: "slurm"
    backend_settings:
      # Enable emulator mode
      emulator_mode: true
      emulator_base_url: "http://localhost:8080"

      # Override SLURM commands to use emulator
      command_prefix: ["python", "/path/to/slurm-emulator/emulator/commands/dispatcher.py"]

      # Periodic limits configuration
      periodic_limits:
        enabled: true
        limit_type: "GrpTRESMins"
        tres_billing_enabled: true
        tres_billing_weights:
          CPU: 0.015625
          Mem: 0.001953125G
          "GRES/gpu": 0.25
        fairshare_decay_half_life: 15
        api_endpoints:
          apply_periodic_settings: "http://localhost:8080/api/apply-periodic-settings"
          downscale_resource: "http://localhost:8080/api/downscale-resource"
          restore_resource: "http://localhost:8080/api/restore-resource"

SLURM Configuration Support

The emulator now supports real SLURM configuration files to match actual deployment behavior:

Loading Configuration

# Use system SLURM configuration
uv run slurm-emulator --config /etc/slurm/slurm.conf

# Use custom configuration
uv run slurm-emulator --config examples/slurm.conf

# Validate configuration
uv run slurm-emulator --validate-only --config slurm.conf

Supported Configuration Parameters

The emulator parses and applies these SLURM configuration parameters:

Priority and Decay Settings:

  • PriorityDecayHalfLife - Fairshare decay half-life (e.g., "15-00:00:00")
  • PriorityUsageResetPeriod - Usage reset period ("None" for manual reset)
  • PriorityWeightFairShare - Fairshare weight for priority calculations
  • PriorityWeightQOS - QoS weight for priority calculations
  • FairShareDampeningFactor - Dampening factor for fairshare

TRES Billing:

  • TRESBillingWeights - Billing weights (e.g., "CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25")

Priority Flags:

  • PriorityFlags - Priority calculation flags (e.g., "MAX_TRES,NO_NORMAL_ASSOC")

Example Configuration

# SLURM Configuration
PriorityDecayHalfLife   = 15-00:00:00
PriorityUsageResetPeriod = None # manual reset via sacctmgr RawUsage=0
PriorityWeightFairShare = 259200
PriorityWeightQOS       = 500000
FairShareDampeningFactor = 3
TRESBillingWeights="CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25"
PriorityFlags=MAX_TRES,NO_NORMAL_ASSOC

Understanding Decay Calculations

The emulator implements SLURM's fairshare decay using the configured half-life:

# Decay formula matches SLURM's implementation
decay_factor = 2 ** (-days_elapsed / half_life_days)

# With default 15-day half-life, after 90 days (1 quarter):
decay_factor = 2 ** (-90 / 15) = 0.0156 (1.56%)

# With 7-day half-life, after 90 days:
decay_factor = 2 ** (-90 / 7) = 0.000135 (0.01%)

Example with 15-day half-life: User consumes 2000 hours in Q1. After Q1 ends (90 days later):

  • Original impact: 2000 hours
  • Decayed impact: 2000 ร— 0.0156 = 31 hours equivalent
  • Q2 allocation: 1000 + (1000 - 31) = 1969 hours available

Key Commands Reference

Time Manipulation

time_show                             # Show current time and period
time_advance <amount> <unit>          # Advance time (units: days, months, quarters)
time_set YYYY-MM-DD [HH:MM:SS]       # Set specific date/time

# Examples:
time_advance 2 months
time_advance 30 days
time_set 2024-05-20

Usage Simulation

usage_inject <user> <amount> [account]  # Inject node-hour usage
usage_show [account] [period]           # Show usage summary with user breakdown

# Examples:
usage_inject user1 200 test-account
usage_show test-account
usage_show test-account 2024-Q1

Account Management

account_create <name> [description] [allocation]  # Create account
account_list                                      # List all accounts with status
account_show <name>                               # Show detailed account info
account_delete <name>                             # Delete account

# Examples:
account_create test "Test Account" 1000
account_show test
account_list

QoS Management

qos_show [account]                    # Show QoS status and details
qos_set <account> <qos>              # Set QoS level (normal/slowdown/blocked)
qos_check [account]                  # Check thresholds and auto-update QoS

# Examples:
qos_check test-account
qos_set test-account slowdown
qos_show test-account

Limits Calculation

limits_calculate [account]           # Calculate and display periodic limits

# Example:
limits_calculate test-account

Scenario Management

scenario_list [type]                 # List scenarios (optionally filter by type)
scenario_describe <name>             # Show detailed description and learning objectives
scenario_steps <name>                # Show step-by-step command breakdown
scenario_run <name>                  # Run scenario automatically
scenario_run <name> --interactive    # Run with confirmation prompts
scenario_run <name> --step-by-step   # Run with detailed step output
scenario_search <query>              # Search scenarios by keyword

# Examples:
scenario_list qos_management
scenario_describe qos_thresholds
scenario_run qos_thresholds --step-by-step
scenario_search decay

Configuration Management

config_show                          # Show current SLURM configuration
config_reload <path>                 # Hot-reload configuration file

# Examples:
config_show
config_reload examples/slurm.conf

State Management

cleanup_all                          # Clean all accounts and reset to fresh state
cleanup_scenario <name>              # Clean specific scenario accounts
cleanup_account <name>               # Clean specific account completely

# Examples:
cleanup_all
cleanup_scenario qos_thresholds
cleanup_account test-account

SLURM Commands

sacctmgr <args>                      # Run sacctmgr command
sacct <args>                         # Run sacct command
sinfo <args>                         # Run sinfo command

# Examples:
sacctmgr list accounts
sacctmgr modify account test set fairshare=333
sacct --accounts=test --format=Account,User,Elapsed

Testing Scenarios

Basic Usage Pattern

# Setup with specific configuration
uv run slurm-emulator --config examples/slurm.conf

# In emulator CLI:
time set 2024-01-01
account create test-account "Test" 1000

# Month 1: Light usage
usage inject user1 100 test-account
time advance 1 months

# Month 2: Heavy usage
usage inject user1 600 test-account
limits calculate test-account
qos check test-account

# Quarter transition
time advance 1 months
limits apply test-account

Configuration Testing

# Test different decay rates
uv run slurm-emulator --config examples/custom_slurm.conf

# Compare configurations
uv run slurm-emulator --validate-only --config examples/slurm.conf
uv run slurm-emulator --validate-only --config examples/custom_slurm.conf

Decay Validation

# Q1: Heavy usage
time set 2024-01-01
account create test-account "Test" 1000
usage inject user1 1500 test-account

# Q2: Check decay impact
time set 2024-04-01
limits calculate test-account
# Should show ~23Nh effective previous usage (1500 * 0.0156)

QoS Threshold Testing

# Setup with 1000Nh allocation (1200Nh threshold with 20% grace)
account create test-account "Test" 1000
qos show test-account  # Should show "normal"

usage inject user1 1100 test-account
qos check test-account  # Should show approaching threshold

usage inject user1 200 test-account  # Total: 1300Nh
qos check test-account  # Should trigger slowdown QoS

Architecture

slurm-emulator/
โ”œโ”€โ”€ emulator/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ time_engine.py          # Time manipulation
โ”‚   โ”‚   โ”œโ”€โ”€ database.py             # In-memory state
โ”‚   โ”‚   โ”œโ”€โ”€ slurm_config.py         # SLURM config parsing
โ”‚   โ”‚   โ””โ”€โ”€ usage_simulator.py      # Usage injection
โ”‚   โ”œโ”€โ”€ commands/
โ”‚   โ”‚   โ”œโ”€โ”€ sacctmgr.py             # sacctmgr emulator
โ”‚   โ”‚   โ”œโ”€โ”€ sacct.py                # sacct emulator
โ”‚   โ”‚   โ””โ”€โ”€ dispatcher.py           # Command routing
โ”‚   โ”œโ”€โ”€ periodic_limits/
โ”‚   โ”‚   โ”œโ”€โ”€ calculator.py           # Decay & carryover
โ”‚   โ”‚   โ””โ”€โ”€ qos_manager.py          # QoS management
โ”‚   โ”œโ”€โ”€ scenarios/
โ”‚   โ”‚   โ”œโ”€โ”€ sequence_scenario.py    # Complete scenario
โ”‚   โ”‚   โ”œโ”€โ”€ scenario_registry.py    # Scenario discovery & running
โ”‚   โ”‚   โ””โ”€โ”€ limits_configuration_scenarios.py
โ”‚   โ”œโ”€โ”€ cli/
โ”‚   โ”‚   โ”œโ”€โ”€ main.py                 # Interactive CLI
โ”‚   โ”‚   โ””โ”€โ”€ cmd_cli.py              # CMD-based CLI
โ”‚   โ””โ”€โ”€ api/
โ”‚       โ””โ”€โ”€ emulator_server.py      # REST API
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ release.py                  # Release management
โ”‚   โ”œโ”€โ”€ changelog.sh                # Changelog generation
โ”‚   โ”œโ”€โ”€ generate_changelog_data.py  # Commit data collection
โ”‚   โ””โ”€โ”€ prompts/
โ”‚       โ””โ”€โ”€ changelog-prompt.md     # Changelog prompt template
โ””โ”€โ”€ tests/                          # Test suites

Development

Running Tests

uv run pytest

Releasing

# Full release: update version, generate changelog, tag, push
uv run scripts/release.py release X.Y.Z

# Skip changelog generation
uv run scripts/release.py release X.Y.Z --skip-changelog

Pushing the tag triggers GitHub Actions for testing and PyPI publishing.

Adding New Scenarios

# Create new scenario class
class CustomScenario:
    def __init__(self, time_engine, database):
        self.time_engine = time_engine
        self.database = database

    def run_scenario(self):
        # Implement scenario steps
        pass

Extending Commands

# Add new SLURM command support
class NewCommandEmulator:
    def handle_command(self, args):
        # Implement command logic
        return "command output"

Troubleshooting

State Persistence

Emulator state is saved to:

  • /tmp/slurm_emulator_time.json - Current time
  • /tmp/slurm_emulator_db.json - Database state

Common Issues

"Account not found": Create account first with account create "No usage records": Inject usage with usage inject "Time not advancing": Check time with time command "API connection failed": Ensure server is running on port 8080

Reset Emulator

rm /tmp/slurm_emulator_*.json
slurm-emulator
# Start fresh

License

MIT License - See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_emulator-0.3.0.tar.gz (74.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_emulator-0.3.0-py3-none-any.whl (84.9 kB view details)

Uploaded Python 3

File details

Details for the file slurm_emulator-0.3.0.tar.gz.

File metadata

  • Download URL: slurm_emulator-0.3.0.tar.gz
  • Upload date:
  • Size: 74.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"11","id":"bullseye","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for slurm_emulator-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ac29a74270861419726cfaec36f076475a61f36a8e5af2555de524373990c95b
MD5 3d3012395af795439d7383fcf122e6e9
BLAKE2b-256 d4c25aac976e3f5d55bbe51b31d24981db7b561842c4a6ee9c736f460a795c5d

See more details on using hashes here.

File details

Details for the file slurm_emulator-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: slurm_emulator-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 84.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"11","id":"bullseye","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for slurm_emulator-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8a9ede13cc2d0e67b1eaa7a6a535b538d86669d542a5163bd3a35c1746a1443
MD5 3a0b9303781ded6a5b820ac0b7d4cfbc
BLAKE2b-256 65ecd4628c66516589e7ca7b4c19632b33066f65a744e1ceff8846e4a2784fbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page