Adaptive load testing tool for ML inference APIs with dynamic scaling and regression detection
Project description
ML Load Testing Tool
Adaptive load testing tool for ML inference APIs. Uses Locust to dynamically scale concurrent users based on P99 latency targets, then analyzes results for regression detection and rate limit recommendations.
Features
- Adaptive Scaling: Automatically adjusts concurrent users based on P99 latency targets
- Multi-Mode Testing: Individual endpoint, production mix, exploration, and spike test modes
- Regression Detection: Compare test results against baselines to catch performance degradations
- Rate Limit Recommendations: Calculates safe rate limits with configurable safety factors
- Notion Integration: Sync results to Notion for tracking and visualization
Installation
From GitHub
pip install git+https://github.com/IncodeTechnologies/ml-load-testing-tool.git
From Source
git clone https://github.com/IncodeTechnologies/ml-load-testing-tool.git
cd ml-load-testing-tool
poetry install
Quick Start
The tool requires a weights module that defines which endpoints to test. Use bundled examples or create your own:
# Test with bundled example TaskSets
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights
# Run headless with specific parameters
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights \
--users 32 \
--spawn-rate 4 \
--run-time 60s \
--headless
The ml-loadtest-file command prints the path to the installed locustfile, giving you full access to all Locust CLI parameters.
Note: The --weights-module parameter is required. It specifies a Python module containing a production_weights dictionary that maps TaskSet classes to their relative weights.
Usage
Two Usage Patterns
Pattern 1: Direct Path (Recommended for CLI)
# Get full locust CLI access with installed package
locust -f $(ml-loadtest-file) --host http://api:8000 [any locust params]
Pattern 2: Local Import (Recommended for Customization)
Create a local locustfile.py in your project:
# Import everything from the installed package
from ml_loadtest.locustfile import *
# Optionally override settings or add custom logic here
Then run:
locust -f locustfile.py --host http://api:8000
Load Testing
The tool supports four test modes via the --test-mode parameter. You can run a single mode or multiple modes in sequence (space-separated).
1. Individual Endpoint Testing
Test each endpoint separately to find individual capacity:
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights \
--test-mode INDIVIDUAL \
--target-p99-ms 500 \
--max-users 100
2. Production Mix Testing
Test with production traffic distribution:
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module ml_loadtest.examples.distribution_weights \
--test-mode PRODUCTION \
--target-p99-ms 500
3. Exploration Mode
Test multiple weight distributions to find optimal mix:
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights \
--test-mode EXPLORATION \
--target-p99-ms 1000
4. Spike Testing
Test sudden traffic spikes:
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module ml_loadtest.examples.distribution_weights \
--test-mode SPIKE \
--spike-target-rps 1000 \
--spike-duration 30
5. Multiple Modes
Run multiple test modes in sequence:
# Run individual and production tests (default)
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights \
--test-mode INDIVIDUAL PRODUCTION
# Run all test modes
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module loadtest.distribution_weights \
--test-mode INDIVIDUAL PRODUCTION EXPLORATION SPIKE
Key Configuration Options
All standard Locust parameters are available, plus:
--target-p99-ms: Target P99 latency in milliseconds (default: 1000)--max-users: Maximum concurrent users (default: 32)--min-users: Minimum concurrent users (default: 1)--test-mode: Test modes to run - INDIVIDUAL, PRODUCTION, EXPLORATION, or SPIKE. Space-separated for multiple (default: INDIVIDUAL PRODUCTION)--increase-rate: User increase multiplier when under target (default: 1.2)--decrease-rate: User decrease multiplier when over target (default: 0.8)--check-interval: Seconds between scaling checks (default: 30)--output-file: Output filename prefix (default: "report_loadtest_results")--weights-module: Python module with custom production_weights (required)--spike-target-rps: Target RPS for spike mode (default: 100.0)--spike-duration: Duration in seconds for spike mode (default: 30)
Full list of Locust parameters: https://docs.locust.io/en/stable/configuration.html
Configuration File
The package includes a locust.conf configuration file that provides default settings for load tests. This allows you to avoid repeating common parameters on the command line.
What is locust.conf?
A Locust configuration file that sets default values for both standard Locust parameters and custom ml-loadtest parameters.
Configuration example:
; Locust configuration file
host = http://api:8000
headless
only-summary
run-time = 2h
loglevel = INFO
csv = report
html = report.html
; Load test settings
target-p99-ms = 1000
min-users = 1
max-users = 32
increase-rate = 1.2
decrease-rate = 0.8
check-interval = 30
tolerance = 0.1
production-run-duration = 1200
weights-module = loadtest.distribution_weights
How to use it:
# Use the bundled config file (from installed package location)
locust -f $(ml-loadtest-file) \
--config locust.conf
# Override specific settings from config file
locust -f $(ml-loadtest-file) \
--config locust.conf \
--max-users 64
Note: Command-line arguments always override config file settings.
Analysis
After running tests, analyze results for regressions and get rate limit recommendations:
# Basic analysis
python -m ml_loadtest.analyze
# Update baseline after confirming results are good
python -m ml_loadtest.analyze --update-baseline
# Custom input/output files
python -m ml_loadtest.analyze \
--input-file custom_report_loadtest_results.json \
--baseline-file my_baseline.json \
--output-file analysis_output.txt
The analyzer will:
- Compare current results against baseline
- Detect performance regressions (default 10% threshold)
- Recommend safe rate limits (default 70% of measured capacity)
- Generate detailed reports with statistics
Notion Integration
Sync test results to Notion for tracking:
# Set Notion credentials (environment variables)
export NOTION_TOKEN="notion-integration-token"
export NOTION_TEST_RESULTS_DATABASE_ID="test-results-database-id"
export NOTION_ENDPOINT_DATABASE_ID="endpoint-database-id"
python -m ml_loadtest.notion_sync "service-name" "v1.0.0" \
--report-file report_loadtest_results.json \
--baseline-file baseline.json
Extending with Custom TaskSets
Creating Custom TaskSets
Each TaskSet must implement this interface:
from locust import TaskSet
class MyCustomTaskSet(TaskSet):
# Required: endpoint identifier
endpoint = "/my-endpoint"
# Required: test implementation
def test_endpoint(self) -> None:
with self.client.post(
self.endpoint,
json={"data": "example"},
name=self.endpoint,
catch_response=True,
) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Failed with {response.status_code}")
Using Custom TaskSets
Create a weights module (e.g., distribution_weights.py):
from my_tasks import TaskSet1, TaskSet2, TaskSet3
production_weights = {
TaskSet1: 50, # 50% of requests
TaskSet2: 30, # 30% of requests
TaskSet3: 20, # 20% of requests
}
Run with custom weights:
locust -f $(ml-loadtest-file) \
--host http://api:8000 \
--weights-module distribution_weights
Architecture
Core Components
-
locustfile.py - Test orchestration with adaptive scaling
EndpointCapacityExplorer: Manages test modes and adaptive scalingLoadTestHttpUser: Executes weighted endpoint tasks- Daemon thread monitors P99 and adjusts user count dynamically
-
analyze.py - Post-test analysis
LoadTestAnalyzer: Regression detection and rate limit calculation- Compares against baselines (10% regression threshold)
- Recommends safe limits (70% of measured capacity by default)
-
distribution_weights.py - Production traffic weights
- Example weight configuration for bundled TaskSets
- Template for custom weight modules
-
notion_sync.py - Notion integration
- Syncs test results to Notion databases
- Tracks performance metrics over time
Data Flow
- Locust users send requests to target endpoints
- Response times captured in circular buffers (maxlen=2000)
- Daemon thread checks P99 every
--check-intervalseconds - User count adjusted based on P99 vs target comparison
- After test completion, JSON/TXT reports saved
- Analyzer loads reports for regression detection and recommendations
Development
Running Tests
make test
Linting and Formatting
make type-check
make format
make lint
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ml_loadtest-1.0.2.tar.gz.
File metadata
- Download URL: ml_loadtest-1.0.2.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.19 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0981a2888f540581c1cd392b4d28c6b4a1a61037349719a9ba62979cf295f024
|
|
| MD5 |
ef1a5e42dafbf7d2168fb1678e2233f0
|
|
| BLAKE2b-256 |
c467a01d4e9d1b4b367f2426c600ef8b62ea06df704ba46e0749a5e798cc6e17
|
File details
Details for the file ml_loadtest-1.0.2-py3-none-any.whl.
File metadata
- Download URL: ml_loadtest-1.0.2-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.1 CPython/3.10.19 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4295ae87e1b79afb796b38427d11c3d8526bf8e7f062ec6c7fb4ed30d36c6b97
|
|
| MD5 |
cc1b30348ed4bcedd8230da77865629b
|
|
| BLAKE2b-256 |
40d03ae4c3c389053394586c375bf071cbfab1927c9b622171a8c5fd27279b0c
|