Skip to main content

SLURM command emulator with time manipulation for testing periodic limits

Project description

SLURM Emulator - Time Travel Edition

A comprehensive SLURM command emulator with time manipulation capabilities for testing periodic limits and decay calculations.

Features

  • ๐ŸŽฎ Interactive CLI - Full command-line interface with time travel
  • โฐ Time Manipulation - Advance time by days, months, or quarters
  • ๐Ÿ’พ Usage Injection - Add specific node-hour usage at any time point
  • ๐Ÿ”„ Decay Calculations - 15-day half-life fairshare decay simulation
  • ๐ŸŽฏ QoS Management - Threshold-based QoS switching (normal โ†’ slowdown โ†’ blocked)
  • ๐Ÿ“Š Periodic Limits - Quarterly allocation with carryover logic
  • ๐ŸŽฌ Scenario Runner - Complete SLURM_PERIODIC_LIMITS_SEQUENCE.md validation
  • ๐Ÿ”Œ API Integration - REST API for waldur-site-agent integration
  • ๐Ÿ’พ State Management - Checkpoint/restore functionality for testing

Quick Start

Installation

# Clone the repository
git clone https://github.com/waldur/slurm-emulator.git
cd slurm-emulator

# Install dependencies using uv
uv sync

Interactive CLI (CMD-based)

# Run with default configuration
uv run slurm-emulator

# Run with SLURM configuration file
uv run slurm-emulator --config examples/slurm.conf

# Advanced features work the same way
uv run slurm-emulator --config examples/custom_slurm.conf

# Validate configuration only
uv run slurm-emulator --validate-only --config /etc/slurm/slurm.conf

๐ŸŽฎ SLURM Emulator - Time Travel Edition (CMD Interface)
Type 'help' or '?' for commands. TAB for auto-completion.
Type 'help <command>' for detailed help on specific commands.

slurm-emulator> help
# Shows all available commands

slurm-emulator> help time_advance
# Shows detailed help for specific command

slurm-emulator> time_advance 2 months
โญ๏ธ  Advanced 2 months

slurm-emulator> account_create test "Test Account" 1000
โœ… Created account test with 1000Nh allocation

slurm-emulator> account create test-account "Test Account" 1000
โœ… Created account test-account with 1000Nh allocation

slurm-emulator> usage inject user1 200 test-account
๐Ÿ’พ Injected 200.0Nh usage for user1 in test-account at 2024-01-01 00:00:00

slurm-emulator> time advance 2 months
โญ๏ธ  Advanced 2 months
โฐ New time: 2024-03-01 00:00:00

slurm-emulator> usage inject user1 400 test-account
๐Ÿ’พ Injected 400.0Nh usage for user1 in test-account at 2024-03-01 00:00:00

slurm-emulator> limits calculate test-account
๐Ÿ“Š Periodic Limits for test-account:
   Period: 2024-Q1
   Base allocation: 1000Nh
   Total allocation: 1000.0Nh
   Fairshare: 333
   QoS threshold: 1200.0Nh
   Grace limit: 1200.0Nh
   Billing minutes: 60000

Complete Sequence Scenario

Run the full scenario from SLURM_PERIODIC_LIMITS_SEQUENCE.md:

slurm-emulator> scenario run sequence --interactive

๐ŸŽฌ Starting SLURM Periodic Limits Sequence Scenario
============================================================

โธ๏ธ  Press Enter to execute Step 1: Initial Q1 setup...

๐Ÿ“ Step 1: Initial Q1 2024 Setup
   Setting up 1000Nh quarterly allocation with 20% grace period
   โš–๏ธ  Set fairshare to 333
   ๐Ÿšซ Set GrpTRESMins to 72000 billing-minutes
   ๐ŸŽฏ QoS threshold set to 1200.0Nh
   ๐Ÿ’พ Checkpoint 'initial_setup' created

# ... continues through all 9 steps of the sequence

Direct SLURM Commands

The emulator intercepts and emulates real SLURM commands:

slurm-emulator> sacctmgr add account test-account description="Test"
 Adding Account(s)
  test-account
 Settings
  Parent     = root
  Description = Test

slurm-emulator> sacctmgr modify account test-account set fairshare=333
 Modified account...
  test-account
 Settings
  fairshare=333

slurm-emulator> sacctmgr modify account test-account set GrpTRESMins=billing=72000
 Modified account...
  test-account
 Settings
  GrpTRESMins=billing=72000

slurm-emulator> sacct --accounts=test-account --starttime=2024-01-01 --endtime=2024-12-31
test-account|cpu=12800,mem=102400,gres/gpu=800|08:00:00|user1

API Integration

Start the API server for waldur-site-agent integration:

# From the slurm-emulator directory
uv run uvicorn emulator.api.emulator_server:app --host 0.0.0.0 --port 8080

API Endpoints

  • POST /api/apply-periodic-settings - Apply periodic limits settings
  • POST /api/downscale-resource - Set QoS to slowdown
  • POST /api/restore-resource - Restore QoS to normal
  • POST /api/submit-report - Submit usage reports
  • GET /api/status - Get emulator status
  • POST /api/time/advance - Advance emulator time

Example API Usage

# Apply periodic settings (from Waldur Mastermind)
curl -X POST http://localhost:8080/api/apply-periodic-settings \\
  -H "Content-Type: application/json" \\
  -d '{
    "resource_id": "slurm_account_123",
    "fairshare": 333,
    "grp_tres_mins": {"billing": 72000},
    "qos_threshold": {"billing": 1000}
  }'

# Submit usage report (from site agent)
curl -X POST http://localhost:8080/api/submit-report \\
  -H "Content-Type: application/json" \\
  -d '{
    "resource_id": "slurm_account_123",
    "usage": {"billing": 167},
    "billing_period": "2024-01-01",
    "date": "2024-01-31T23:59:59Z",
    "users": {
      "user1": {"billing": 100},
      "user2": {"billing": 67}
    }
  }'

# Advance time for testing
curl -X POST "http://localhost:8080/api/time/advance?months=3"

Waldur Site Agent Integration

Configure waldur-site-agent to use the emulator:

# waldur-site-agent-config.yaml
offerings:
  - name: "SLURM HPC Cluster - Emulator"
    backend_type: "slurm"
    backend_settings:
      # Enable emulator mode
      emulator_mode: true
      emulator_base_url: "http://localhost:8080"

      # Override SLURM commands to use emulator
      command_prefix: ["python", "/path/to/slurm-emulator/emulator/commands/dispatcher.py"]

      # Periodic limits configuration
      periodic_limits:
        enabled: true
        limit_type: "GrpTRESMins"
        tres_billing_enabled: true
        tres_billing_weights:
          CPU: 0.015625
          Mem: 0.001953125G
          "GRES/gpu": 0.25
        fairshare_decay_half_life: 15
        api_endpoints:
          apply_periodic_settings: "http://localhost:8080/api/apply-periodic-settings"
          downscale_resource: "http://localhost:8080/api/downscale-resource"
          restore_resource: "http://localhost:8080/api/restore-resource"

SLURM Configuration Support

The emulator now supports real SLURM configuration files to match actual deployment behavior:

Loading Configuration

# Use system SLURM configuration
uv run slurm-emulator --config /etc/slurm/slurm.conf

# Use custom configuration
uv run slurm-emulator --config examples/slurm.conf

# Validate configuration
uv run slurm-emulator --validate-only --config slurm.conf

Supported Configuration Parameters

The emulator parses and applies these SLURM configuration parameters:

Priority and Decay Settings:

  • PriorityDecayHalfLife - Fairshare decay half-life (e.g., "15-00:00:00")
  • PriorityUsageResetPeriod - Usage reset period ("None" for manual reset)
  • PriorityWeightFairShare - Fairshare weight for priority calculations
  • PriorityWeightQOS - QoS weight for priority calculations
  • FairShareDampeningFactor - Dampening factor for fairshare

TRES Billing:

  • TRESBillingWeights - Billing weights (e.g., "CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25")

Priority Flags:

  • PriorityFlags - Priority calculation flags (e.g., "MAX_TRES,NO_NORMAL_ASSOC")

Example Configuration

# SLURM Configuration
PriorityDecayHalfLife   = 15-00:00:00
PriorityUsageResetPeriod = None # manual reset via sacctmgr RawUsage=0
PriorityWeightFairShare = 259200
PriorityWeightQOS       = 500000
FairShareDampeningFactor = 3
TRESBillingWeights="CPU=0.015625,Mem=0.001953125G,GRES/gpu=0.25"
PriorityFlags=MAX_TRES,NO_NORMAL_ASSOC

Understanding Decay Calculations

The emulator implements SLURM's fairshare decay using the configured half-life:

# Decay formula matches SLURM's implementation
decay_factor = 2 ** (-days_elapsed / half_life_days)

# With default 15-day half-life, after 90 days (1 quarter):
decay_factor = 2 ** (-90 / 15) = 0.0156 (1.56%)

# With 7-day half-life, after 90 days:
decay_factor = 2 ** (-90 / 7) = 0.000135 (0.01%)

Example with 15-day half-life: User consumes 2000 hours in Q1. After Q1 ends (90 days later):

  • Original impact: 2000 hours
  • Decayed impact: 2000 ร— 0.0156 = 31 hours equivalent
  • Q2 allocation: 1000 + (1000 - 31) = 1969 hours available

Key Commands Reference

Time Manipulation

time_show                             # Show current time and period
time_advance <amount> <unit>          # Advance time (units: days, months, quarters)
time_set YYYY-MM-DD [HH:MM:SS]       # Set specific date/time

# Examples:
time_advance 2 months
time_advance 30 days
time_set 2024-05-20

Usage Simulation

usage_inject <user> <amount> [account]  # Inject node-hour usage
usage_show [account] [period]           # Show usage summary with user breakdown

# Examples:
usage_inject user1 200 test-account
usage_show test-account
usage_show test-account 2024-Q1

Account Management

account_create <name> [description] [allocation]  # Create account
account_list                                      # List all accounts with status
account_show <name>                               # Show detailed account info
account_delete <name>                             # Delete account

# Examples:
account_create test "Test Account" 1000
account_show test
account_list

QoS Management

qos_show [account]                    # Show QoS status and details
qos_set <account> <qos>              # Set QoS level (normal/slowdown/blocked)
qos_check [account]                  # Check thresholds and auto-update QoS

# Examples:
qos_check test-account
qos_set test-account slowdown
qos_show test-account

Limits Calculation

limits_calculate [account]           # Calculate and display periodic limits

# Example:
limits_calculate test-account

Scenario Management

scenario_list [type]                 # List scenarios (optionally filter by type)
scenario_describe <name>             # Show detailed description and learning objectives
scenario_steps <name>                # Show step-by-step command breakdown
scenario_run <name>                  # Run scenario automatically
scenario_run <name> --interactive    # Run with confirmation prompts
scenario_run <name> --step-by-step   # Run with detailed step output
scenario_search <query>              # Search scenarios by keyword

# Examples:
scenario_list qos_management
scenario_describe qos_thresholds
scenario_run qos_thresholds --step-by-step
scenario_search decay

Configuration Management

config_show                          # Show current SLURM configuration
config_reload <path>                 # Hot-reload configuration file

# Examples:
config_show
config_reload examples/slurm.conf

State Management

cleanup_all                          # Clean all accounts and reset to fresh state
cleanup_scenario <name>              # Clean specific scenario accounts
cleanup_account <name>               # Clean specific account completely

# Examples:
cleanup_all
cleanup_scenario qos_thresholds
cleanup_account test-account

SLURM Commands

sacctmgr <args>                      # Run sacctmgr command
sacct <args>                         # Run sacct command
sinfo <args>                         # Run sinfo command

# Examples:
sacctmgr list accounts
sacctmgr modify account test set fairshare=333
sacct --accounts=test --format=Account,User,Elapsed

Testing Scenarios

Basic Usage Pattern

# Setup with specific configuration
uv run slurm-emulator --config examples/slurm.conf

# In emulator CLI:
time set 2024-01-01
account create test-account "Test" 1000

# Month 1: Light usage
usage inject user1 100 test-account
time advance 1 months

# Month 2: Heavy usage
usage inject user1 600 test-account
limits calculate test-account
qos check test-account

# Quarter transition
time advance 1 months
limits apply test-account

Configuration Testing

# Test different decay rates
uv run slurm-emulator --config examples/custom_slurm.conf

# Compare configurations
uv run slurm-emulator --validate-only --config examples/slurm.conf
uv run slurm-emulator --validate-only --config examples/custom_slurm.conf

Decay Validation

# Q1: Heavy usage
time set 2024-01-01
account create test-account "Test" 1000
usage inject user1 1500 test-account

# Q2: Check decay impact
time set 2024-04-01
limits calculate test-account
# Should show ~23Nh effective previous usage (1500 * 0.0156)

QoS Threshold Testing

# Setup with 1000Nh allocation (1200Nh threshold with 20% grace)
account create test-account "Test" 1000
qos show test-account  # Should show "normal"

usage inject user1 1100 test-account
qos check test-account  # Should show approaching threshold

usage inject user1 200 test-account  # Total: 1300Nh
qos check test-account  # Should trigger slowdown QoS

Architecture

slurm-emulator/
โ”œโ”€โ”€ emulator/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ time_engine.py          # Time manipulation
โ”‚   โ”‚   โ”œโ”€โ”€ database.py             # In-memory state
โ”‚   โ”‚   โ””โ”€โ”€ usage_simulator.py      # Usage injection
โ”‚   โ”œโ”€โ”€ commands/
โ”‚   โ”‚   โ”œโ”€โ”€ sacctmgr.py            # sacctmgr emulator
โ”‚   โ”‚   โ”œโ”€โ”€ sacct.py               # sacct emulator
โ”‚   โ”‚   โ””โ”€โ”€ dispatcher.py          # Command routing
โ”‚   โ”œโ”€โ”€ periodic_limits/
โ”‚   โ”‚   โ”œโ”€โ”€ calculator.py          # Decay & carryover
โ”‚   โ”‚   โ””โ”€โ”€ qos_manager.py         # QoS management
โ”‚   โ”œโ”€โ”€ scenarios/
โ”‚   โ”‚   โ””โ”€โ”€ sequence_scenario.py   # Complete scenario
โ”‚   โ”œโ”€โ”€ cli/
โ”‚   โ”‚   โ””โ”€โ”€ main.py                # Interactive CLI
โ”‚   โ””โ”€โ”€ api/
โ”‚       โ””โ”€โ”€ emulator_server.py     # REST API
โ””โ”€โ”€ tests/                         # Test suites

Development

Running Tests

uv run python -m pytest tests/ -v

Adding New Scenarios

# Create new scenario class
class CustomScenario:
    def __init__(self, time_engine, database):
        self.time_engine = time_engine
        self.database = database

    def run_scenario(self):
        # Implement scenario steps
        pass

Extending Commands

# Add new SLURM command support
class NewCommandEmulator:
    def handle_command(self, args):
        # Implement command logic
        return "command output"

Troubleshooting

State Persistence

Emulator state is saved to:

  • /tmp/slurm_emulator_time.json - Current time
  • /tmp/slurm_emulator_db.json - Database state

Common Issues

"Account not found": Create account first with account create "No usage records": Inject usage with usage inject "Time not advancing": Check time with time command "API connection failed": Ensure server is running on port 8080

Reset Emulator

rm /tmp/slurm_emulator_*.json
slurm-emulator
# Start fresh

License

MIT License - See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slurm_emulator-0.1.1.tar.gz (68.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slurm_emulator-0.1.1-py3-none-any.whl (78.8 kB view details)

Uploaded Python 3

File details

Details for the file slurm_emulator-0.1.1.tar.gz.

File metadata

  • Download URL: slurm_emulator-0.1.1.tar.gz
  • Upload date:
  • Size: 68.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurm_emulator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 34da258f00268c990671b588a8621eed480e823135b877d6fa324624eef80b96
MD5 ec189409f7bc2e44a8fd8ae6aeb643a3
BLAKE2b-256 728db7a21c4d230fd72290b02fed5e82066ab415e241ecc6ce95c5cc4342378c

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurm_emulator-0.1.1.tar.gz:

Publisher: publish.yml on waldur/slurm-emulator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file slurm_emulator-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: slurm_emulator-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 78.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for slurm_emulator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 60ecab0958e45b73563f7c99bb9bf048b1a11a8eef01a36c2aeb3d04e33abef6
MD5 aa756e4b6b56ae31372ae853a46e225c
BLAKE2b-256 93ea4ec4e06dffd75fe17de68c0f37d7e19308ff308a520606e71a4c7ac44ab9

See more details on using hashes here.

Provenance

The following attestation bundles were made for slurm_emulator-0.1.1-py3-none-any.whl:

Publisher: publish.yml on waldur/slurm-emulator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page