Multi-provider rate limiter for LLM API pipelines
Project description
ratelimiter
Multi-provider rate limiter for LLM API pipelines. Drop-in module that keeps you out of 429 Too Many Requests trouble across all the providers you use.
- Per (provider, model) limits — RPM, TPM, RPD with sliding window
- Auto 429 backoff — exponential with jitter, reads
Retry-Afterwhen present - Thread-safe + async-safe — same
RateLimiterfor both worlds - YAML config —
plans.yamlwith glob wildcards, easy to edit - Zero hard dependencies — only
pyyaml - Empirically tested — 300 RPM on tokenrouter/MiniMax-M3 verified via burst test
Install
The PyPI package is published as agamenox-ratelimiter (the plain ratelimiter name is taken on PyPI since 2013). The Python import stays as ratelimiter — different names, same module.
pip install agamenox-ratelimiter
# or, from source:
git clone https://github.com/Agamenox/ratelimiter.git
cd ratelimiter
pip install -e .
Quick start
from ratelimiter import RateLimiter, call_with_retry
limiter = RateLimiter.from_yaml("plans.yaml")
# Wrap any function — auto 429 retry
def call_m3(prompt: str) -> str:
return call_with_retry(
limiter, "tokenrouter", "MiniMax-M3",
api_call_fn, prompt,
max_retries=5, base_backoff=1.0, max_backoff=60.0,
)
# Or acquire manually
limiter.acquire("tokenrouter", "MiniMax-M3")
response = api_call(...)
# Monitor usage
print(limiter.status("tokenrouter", "MiniMax-M3"))
# {'rpm_used': 5, 'rpm_limit': 300, 'tpm_used': 0, ...}
Async:
ok = await limiter.acquire_async("openrouter", "minimax/MiniMax-M2.5-highspeed")
Why this exists
I was running a batch pipeline through tokenrouter's free MiniMax-M3 model
and getting mysterious 429s. After a sustained-rate test I learned the limit
was 300 requests/minute, no rate limit headers, sliding 60s window — and
that the error only told me anything after I hit the wall.
So I built a small limiter, measured more providers, and packaged it up. Now my pipelines throttle themselves before the wall, and when they do hit it they back off cleanly with exponential jitter.
The key insight: different providers have different limits, different header conventions, and different retry semantics. Hardcoding any of it is a maintenance trap. Hence the YAML registry — you measure once, write it down, and the limiter does the right thing for every provider.
Plans (registry format)
tokenrouter:
MiniMax-M3:
rpm: 300
tier: free
notes: "Empirically 300 RPM, no headers, sliding window."
openrouter:
"*:free":
rpm: 20
rpd: 200
tier: free
Fields: rpm (required), tpm, rpd, burst, tier (free|paid|enterprise|local), notes. Wildcards (*, ?) supported in the model field.
Detected limits (2026-06-14)
| Provider / Model | RPM | TPM | RPD | Source |
|---|---|---|---|---|
tokenrouter/MiniMax-M3 (free) |
300 | — | — | Empirical burst test |
openrouter/*:free |
20 | — | 200 | OpenRouter docs |
nvidia/* (NIM) |
40 | 800K | — | Conservative default |
zai/glm-5-turbo |
10 | 500K | — | User report |
minimax/M2.5-highspeed |
60 | 1M | — | Conservative |
opencode-go/* |
60 | 500K | — | Conservative |
lmstudio/* |
∞ | — | — | Local |
Naming note: PyPI package is agamenox-ratelimiter; Python import is ratelimiter (the module directory name). Use pip install agamenox-ratelimiter to install; from ratelimiter import ... to use.
If you've measured a different limit, open a rate-limit data issue so the registry stays honest.
API
lim = RateLimiter.from_yaml("plans.yaml") # or from_dict({...})
# Sync
ok = lim.acquire(provider, model, estimated_tokens=0, timeout=300)
lim.release(provider, model, estimated_tokens=0) # refund a slot
status = lim.status(provider, model) # snapshot dict
# Async
ok = await lim.acquire_async(provider, model, estimated_tokens=0)
# Auto-429 wrapper
result = call_with_retry(lim, provider, model, fn, *args,
max_retries=5, base_backoff=1.0, max_backoff=60.0,
estimated_tokens_fn=None)
Detects 429s from urllib, requests, httpx, and any object with
.status_code == 429. Reads Retry-After and X-RateLimit-* headers when
present.
Documentation
- API reference — full method signatures, algorithm notes
- Tokenrouter specifics — empirical test methodology
- Contributing
- Changelog
Tests
python tests/test_limiter.py # 20 unit tests, < 0.5s, no network
python examples/integration_test.py # 5 real API calls to tokenrouter
The unit tests use a FakeClock so the time-dependent ones run in
milliseconds. The integration test requires pyyaml and a tokenrouter API key
(loaded from F:\dev\ratelimiter\examples\integration_test.py config).
CI
GitHub Actions runs on every push and PR:
- Unit tests on Python 3.9–3.13, Ubuntu + Windows + macOS
- Lint + syntax check on every
.pyfile - YAML round-trip — verify
plans.yamlis loadable - CodeQL — security analysis, weekly schedule
- Publish to PyPI — auto-triggered on GitHub release (trusted publishing)
Roadmap
- Per-key / per-project quotas (multi-tenant)
- Prometheus metrics export
- Redis backend for distributed pipelines
- Async context manager:
async with limiter.guard(...) as ok:
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agamenox_ratelimiter-0.1.0.tar.gz.
File metadata
- Download URL: agamenox_ratelimiter-0.1.0.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1773b63de7e4b9480a33134ba8378df30ab8bddf4b3ef51483fef4c785df642c
|
|
| MD5 |
8bedc8b2e04478724ab2dec0f8bf14b1
|
|
| BLAKE2b-256 |
cabe65d05af445b832a2cacfd2aa2245370403fd9bf08b2f4ec02af80fdd9966
|
Provenance
The following attestation bundles were made for agamenox_ratelimiter-0.1.0.tar.gz:
Publisher:
publish.yml on Agamenox/ratelimiter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agamenox_ratelimiter-0.1.0.tar.gz -
Subject digest:
1773b63de7e4b9480a33134ba8378df30ab8bddf4b3ef51483fef4c785df642c - Sigstore transparency entry: 1833772497
- Sigstore integration time:
-
Permalink:
Agamenox/ratelimiter@3eb8b1adbc648e9acb2f2c3d8e6a20f99ee50adc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Agamenox
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3eb8b1adbc648e9acb2f2c3d8e6a20f99ee50adc -
Trigger Event:
release
-
Statement type:
File details
Details for the file agamenox_ratelimiter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agamenox_ratelimiter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf5d278b4f821c1ee8cb1acb965f7118262e3709978ed4b18c3e1b2878711a53
|
|
| MD5 |
b29f826b9c3f3026588872f18e5bdd64
|
|
| BLAKE2b-256 |
07c966be5a7cac7d724d599df22855f1d9d39e388c3cb30bc348aedd2d4c7bf9
|
Provenance
The following attestation bundles were made for agamenox_ratelimiter-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Agamenox/ratelimiter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agamenox_ratelimiter-0.1.0-py3-none-any.whl -
Subject digest:
bf5d278b4f821c1ee8cb1acb965f7118262e3709978ed4b18c3e1b2878711a53 - Sigstore transparency entry: 1833772722
- Sigstore integration time:
-
Permalink:
Agamenox/ratelimiter@3eb8b1adbc648e9acb2f2c3d8e6a20f99ee50adc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Agamenox
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3eb8b1adbc648e9acb2f2c3d8e6a20f99ee50adc -
Trigger Event:
release
-
Statement type: