Skip to main content

Multi-provider rate limiter for LLM API pipelines

Project description

ratelimiter

CI CodeQL Python 3.9+ License: MIT

Multi-provider rate limiter for LLM API pipelines. Drop-in module that keeps you out of 429 Too Many Requests trouble across all the providers you use.

  • Per (provider, model) limits — RPM, TPM, RPD with sliding window
  • Auto 429 backoff — exponential with jitter, reads Retry-After when present
  • Thread-safe + async-safe — same RateLimiter for both worlds
  • YAML configplans.yaml with glob wildcards, easy to edit
  • Zero hard dependencies — only pyyaml
  • Empirically tested — 300 RPM on tokenrouter/MiniMax-M3 verified via burst test

Install

The PyPI package is published as agamenox-ratelimiter (the plain ratelimiter name is taken on PyPI since 2013). The Python import stays as ratelimiter — different names, same module.

pip install agamenox-ratelimiter
# or, from source:
git clone https://github.com/Agamenox/ratelimiter.git
cd ratelimiter
pip install -e .

Quick start

from ratelimiter import RateLimiter, call_with_retry

limiter = RateLimiter.from_yaml("plans.yaml")

# Wrap any function — auto 429 retry
def call_m3(prompt: str) -> str:
    return call_with_retry(
        limiter, "tokenrouter", "MiniMax-M3",
        api_call_fn, prompt,
        max_retries=5, base_backoff=1.0, max_backoff=60.0,
    )

# Or acquire manually
limiter.acquire("tokenrouter", "MiniMax-M3")
response = api_call(...)

# Monitor usage
print(limiter.status("tokenrouter", "MiniMax-M3"))
# {'rpm_used': 5, 'rpm_limit': 300, 'tpm_used': 0, ...}

Async:

ok = await limiter.acquire_async("openrouter", "minimax/MiniMax-M2.5-highspeed")

Why this exists

I was running a batch pipeline through tokenrouter's free MiniMax-M3 model and getting mysterious 429s. After a sustained-rate test I learned the limit was 300 requests/minute, no rate limit headers, sliding 60s window — and that the error only told me anything after I hit the wall.

So I built a small limiter, measured more providers, and packaged it up. Now my pipelines throttle themselves before the wall, and when they do hit it they back off cleanly with exponential jitter.

The key insight: different providers have different limits, different header conventions, and different retry semantics. Hardcoding any of it is a maintenance trap. Hence the YAML registry — you measure once, write it down, and the limiter does the right thing for every provider.

Plans (registry format)

tokenrouter:
  MiniMax-M3:
    rpm: 300
    tier: free
    notes: "Empirically 300 RPM, no headers, sliding window."

openrouter:
  "*:free":
    rpm: 20
    rpd: 200
    tier: free

Fields: rpm (required), tpm, rpd, burst, tier (free|paid|enterprise|local), notes. Wildcards (*, ?) supported in the model field.

Detected limits (2026-06-14)

Provider / Model RPM TPM RPD Source
tokenrouter/MiniMax-M3 (free) 300 Empirical burst test
openrouter/*:free 20 200 OpenRouter docs
nvidia/* (NIM) 40 800K Conservative default
zai/glm-5-turbo 10 500K User report
minimax/M2.5-highspeed 60 1M Conservative
opencode-go/* 60 500K Conservative
lmstudio/* Local

Naming note: PyPI package is agamenox-ratelimiter; Python import is ratelimiter (the module directory name). Use pip install agamenox-ratelimiter to install; from ratelimiter import ... to use.

If you've measured a different limit, open a rate-limit data issue so the registry stays honest.

API

lim = RateLimiter.from_yaml("plans.yaml")      # or from_dict({...})

# Sync
ok = lim.acquire(provider, model, estimated_tokens=0, timeout=300)
lim.release(provider, model, estimated_tokens=0)        # refund a slot
status = lim.status(provider, model)                    # snapshot dict

# Async
ok = await lim.acquire_async(provider, model, estimated_tokens=0)

# Auto-429 wrapper
result = call_with_retry(lim, provider, model, fn, *args,
                         max_retries=5, base_backoff=1.0, max_backoff=60.0,
                         estimated_tokens_fn=None)

Detects 429s from urllib, requests, httpx, and any object with .status_code == 429. Reads Retry-After and X-RateLimit-* headers when present.

Documentation

Tests

python tests/test_limiter.py        # 20 unit tests, < 0.5s, no network
python examples/integration_test.py # 5 real API calls to tokenrouter

The unit tests use a FakeClock so the time-dependent ones run in milliseconds. The integration test requires pyyaml and a tokenrouter API key (loaded from F:\dev\ratelimiter\examples\integration_test.py config).

CI

GitHub Actions runs on every push and PR:

  • Unit tests on Python 3.9–3.13, Ubuntu + Windows + macOS
  • Lint + syntax check on every .py file
  • YAML round-trip — verify plans.yaml is loadable
  • CodeQL — security analysis, weekly schedule
  • Publish to PyPI — auto-triggered on GitHub release (trusted publishing)

Roadmap

  • Per-key / per-project quotas (multi-tenant)
  • Prometheus metrics export
  • Redis backend for distributed pipelines
  • Async context manager: async with limiter.guard(...) as ok:

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agamenox_ratelimiter-0.1.0.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agamenox_ratelimiter-0.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file agamenox_ratelimiter-0.1.0.tar.gz.

File metadata

  • Download URL: agamenox_ratelimiter-0.1.0.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agamenox_ratelimiter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1773b63de7e4b9480a33134ba8378df30ab8bddf4b3ef51483fef4c785df642c
MD5 8bedc8b2e04478724ab2dec0f8bf14b1
BLAKE2b-256 cabe65d05af445b832a2cacfd2aa2245370403fd9bf08b2f4ec02af80fdd9966

See more details on using hashes here.

Provenance

The following attestation bundles were made for agamenox_ratelimiter-0.1.0.tar.gz:

Publisher: publish.yml on Agamenox/ratelimiter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agamenox_ratelimiter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agamenox_ratelimiter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf5d278b4f821c1ee8cb1acb965f7118262e3709978ed4b18c3e1b2878711a53
MD5 b29f826b9c3f3026588872f18e5bdd64
BLAKE2b-256 07c966be5a7cac7d724d599df22855f1d9d39e388c3cb30bc348aedd2d4c7bf9

See more details on using hashes here.

Provenance

The following attestation bundles were made for agamenox_ratelimiter-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Agamenox/ratelimiter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page