Adaptive load testing tool for ML inference APIs with dynamic scaling and regression detection

These details have not been verified by PyPI

Project links

Project description

ML Load Testing Tool

Adaptive load testing tool for ML inference APIs. Uses Locust to dynamically scale concurrent users based on P99 latency targets, then analyzes results for regression detection and rate limit recommendations.

Features

Adaptive Scaling: Automatically adjusts concurrent users based on P99 latency targets
Multi-Mode Testing: Individual endpoint, production mix, exploration, and spike test modes
Regression Detection: Compare test results against baselines to catch performance degradations
Rate Limit Recommendations: Calculates safe rate limits with configurable safety factors
Notion Integration: Sync results to Notion for tracking and visualization

Installation

From GitHub

pip install git+https://github.com/IncodeTechnologies/ml-load-testing-tool.git

From Source

git clone https://github.com/IncodeTechnologies/ml-load-testing-tool.git
cd ml-load-testing-tool
poetry install

Quick Start

The tool requires a weights module that defines which endpoints to test. Use bundled examples or create your own:

# Test with bundled example TaskSets
locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights

# Run headless with specific parameters
locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights \
  --users 32 \
  --spawn-rate 4 \
  --run-time 60s \
  --headless

The ml-loadtest-file command prints the path to the installed locustfile, giving you full access to all Locust CLI parameters.

Note: The --weights-module parameter is required. It specifies a Python module containing a production_weights dictionary that maps TaskSet classes to their relative weights.

Usage

Two Usage Patterns

Pattern 1: Direct Path (Recommended for CLI)

# Get full locust CLI access with installed package
locust -f $(ml-loadtest-file) --host http://api:8000 [any locust params]

Pattern 2: Local Import (Recommended for Customization)

Create a local locustfile.py in your project:

# Import everything from the installed package
from ml_loadtest.locustfile import *

# Optionally override settings or add custom logic here

Then run:

locust -f locustfile.py --host http://api:8000

Load Testing

The tool supports four test modes via the --test-mode parameter. You can run a single mode or multiple modes in sequence (space-separated).

1. Individual Endpoint Testing

Test each endpoint separately to find individual capacity:

locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights \
  --test-mode INDIVIDUAL \
  --target-p99-ms 500 \
  --max-users 100

2. Production Mix Testing

Test with production traffic distribution:

locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module ml_loadtest.examples.distribution_weights \
  --test-mode PRODUCTION \
  --target-p99-ms 500

3. Exploration Mode

Test multiple weight distributions to find optimal mix:

locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights \
  --test-mode EXPLORATION \
  --target-p99-ms 1000

4. Spike Testing

Test sudden traffic spikes:

locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module ml_loadtest.examples.distribution_weights \
  --test-mode SPIKE \
  --spike-target-rps 1000 \
  --spike-duration 30

5. Multiple Modes

Run multiple test modes in sequence:

# Run individual and production tests (default)
locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights \
  --test-mode INDIVIDUAL PRODUCTION

# Run all test modes
locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module loadtest.distribution_weights \
  --test-mode INDIVIDUAL PRODUCTION EXPLORATION SPIKE

Key Configuration Options

All standard Locust parameters are available, plus:

--target-p99-ms: Target P99 latency in milliseconds (default: 1000)
--max-users: Maximum concurrent users (default: 32)
--min-users: Minimum concurrent users (default: 1)
--test-mode: Test modes to run - INDIVIDUAL, PRODUCTION, EXPLORATION, or SPIKE. Space-separated for multiple (default: INDIVIDUAL PRODUCTION)
--increase-rate: User increase multiplier when under target (default: 1.2)
--decrease-rate: User decrease multiplier when over target (default: 0.8)
--check-interval: Seconds between scaling checks (default: 30)
--output-file: Output filename prefix (default: "report_loadtest_results")
--weights-module: Python module with custom production_weights (required)
--spike-target-rps: Target RPS for spike mode (default: 100.0)
--spike-duration: Duration in seconds for spike mode (default: 30)

Full list of Locust parameters: https://docs.locust.io/en/stable/configuration.html

Configuration File

The package includes a locust.conf configuration file that provides default settings for load tests. This allows you to avoid repeating common parameters on the command line.

What is locust.conf?

A Locust configuration file that sets default values for both standard Locust parameters and custom ml-loadtest parameters.

Configuration example:

; Locust configuration file
host = http://api:8000
headless
only-summary
run-time = 2h
loglevel = INFO
csv = report
html = report.html

; Load test settings
target-p99-ms = 1000
min-users = 1
max-users = 32
increase-rate = 1.2
decrease-rate = 0.8
check-interval = 30
tolerance = 0.1
production-run-duration = 1200
weights-module = loadtest.distribution_weights

How to use it:

# Use the bundled config file (from installed package location)
locust -f $(ml-loadtest-file) \
  --config locust.conf

# Override specific settings from config file
locust -f $(ml-loadtest-file) \
  --config locust.conf \
  --max-users 64

Note: Command-line arguments always override config file settings.

Analysis

After running tests, analyze results for regressions and get rate limit recommendations:

# Basic analysis
python -m ml_loadtest.analyze

# Update baseline after confirming results are good
python -m ml_loadtest.analyze --update-baseline

# Custom input/output files
python -m ml_loadtest.analyze \
  --input-file custom_report_loadtest_results.json \
  --baseline-file my_baseline.json \
  --output-file analysis_output.txt

The analyzer will:

Compare current results against baseline
Detect performance regressions (default 10% threshold)
Recommend safe rate limits (default 70% of measured capacity)
Generate detailed reports with statistics

Notion Integration

Sync test results to Notion for tracking:

# Set Notion credentials (environment variables)
export NOTION_TOKEN="notion-integration-token"
export NOTION_TEST_RESULTS_DATABASE_ID="test-results-database-id"
export NOTION_ENDPOINT_DATABASE_ID="endpoint-database-id"

python -m ml_loadtest.notion_sync "service-name" "v1.0.0" \
    --report-file report_loadtest_results.json \
    --baseline-file baseline.json

Extending with Custom TaskSets

Creating Custom TaskSets

Each TaskSet must implement this interface:

from locust import TaskSet

class MyCustomTaskSet(TaskSet):
    # Required: endpoint identifier
    endpoint = "/my-endpoint"

    # Required: test implementation
    def test_endpoint(self) -> None:
        with self.client.post(
            self.endpoint,
            json={"data": "example"},
            name=self.endpoint,
            catch_response=True,
        ) as response:
            if response.status_code == 200:
                response.success()
            else:
                response.failure(f"Failed with {response.status_code}")

Using Custom TaskSets

Create a weights module (e.g., distribution_weights.py):

from my_tasks import TaskSet1, TaskSet2, TaskSet3

production_weights = {
    TaskSet1: 50,  # 50% of requests
    TaskSet2: 30,  # 30% of requests
    TaskSet3: 20,  # 20% of requests
}

Run with custom weights:

locust -f $(ml-loadtest-file) \
  --host http://api:8000 \
  --weights-module distribution_weights

Architecture

Core Components

locustfile.py - Test orchestration with adaptive scaling
- EndpointCapacityExplorer: Manages test modes and adaptive scaling
- LoadTestHttpUser: Executes weighted endpoint tasks
- Daemon thread monitors P99 and adjusts user count dynamically
analyze.py - Post-test analysis
- LoadTestAnalyzer: Regression detection and rate limit calculation
- Compares against baselines (10% regression threshold)
- Recommends safe limits (70% of measured capacity by default)
distribution_weights.py - Production traffic weights
- Example weight configuration for bundled TaskSets
- Template for custom weight modules
notion_sync.py - Notion integration
- Syncs test results to Notion databases
- Tracks performance metrics over time

Data Flow

Locust users send requests to target endpoints
Response times captured in circular buffers (maxlen=2000)
Daemon thread checks P99 every --check-interval seconds
User count adjusted based on P99 vs target comparison
After test completion, JSON/TXT reports saved
Analyzer loads reports for regression detection and recommendations

Development

Running Tests

make test

Linting and Formatting

make type-check
make format
make lint

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.6.0

Mar 6, 2026

1.5.0

Mar 5, 2026

1.4.0

Mar 4, 2026

This version

1.3.0

Mar 4, 2026

1.2.0

Mar 3, 2026

1.1.2

Feb 25, 2026

1.1.1

Feb 25, 2026

1.1.0

Feb 25, 2026

1.0.2

Feb 24, 2026

1.0.1

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_loadtest-1.3.0.tar.gz (25.3 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ml_loadtest-1.3.0-py3-none-any.whl (24.7 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file ml_loadtest-1.3.0.tar.gz.

File metadata

Download URL: ml_loadtest-1.3.0.tar.gz
Upload date: Mar 4, 2026
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for ml_loadtest-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`61d1a4cc637ba31b5ef7437ccc9c3f003de22170f6842bf71693a720e21b4da2`
MD5	`2a03ed56ad21e21b5e7a17e52a437d18`
BLAKE2b-256	`c081ab00c51799121428f24900357d024e9e435388a12b814c6efa68f83c6edc`

See more details on using hashes here.

File details

Details for the file ml_loadtest-1.3.0-py3-none-any.whl.

File metadata

Download URL: ml_loadtest-1.3.0-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 24.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.0.1 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for ml_loadtest-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3711dbe672ee21d53ca9bee648b7c797f5c73f9daa796f633879524f43ddbfbe`
MD5	`81ddb31a36628f3d86a2a7683c6390ad`
BLAKE2b-256	`a6345d9c41437efd42337a5b84029329632df984bbd0cc720e8378242e28d9c0`

See more details on using hashes here.

ml-loadtest 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ML Load Testing Tool

Features

Installation

From GitHub

From Source

Quick Start

Usage

Two Usage Patterns

Load Testing

1. Individual Endpoint Testing

2. Production Mix Testing

3. Exploration Mode

4. Spike Testing

5. Multiple Modes

Key Configuration Options

Configuration File

Analysis

Notion Integration

Extending with Custom TaskSets

Creating Custom TaskSets

Using Custom TaskSets

Architecture

Core Components

Data Flow

Development

Running Tests

Linting and Formatting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes