Anticipatory rate-limit orchestration engine for multi-provider systems — predict 429s before they happen

These details have not been verified by PyPI

Project description

grate-limiter

Anticipatory rate-limit orchestration engine for multi-provider systems.

Stop reacting to 429 Too Many Requests. grate-limiter predicts quota exhaustion before it happens and routes requests to the best available provider — all in-process, with zero network overhead.

Anticipatory routing — scores every provider on quota, health, priority, and latency before each request
Automatic failover — cooldown tracking with EWMA health decay means degraded providers are bypassed automatically
Multiple quota dimensions — requests-per-minute, tokens-per-minute, concurrency limits, all at once
Thread-safe — uses threading.Lock internally; safe to call from multiple threads
Deterministic testing — built-in MockClock lets you simulate time-based behavior in unit tests
Fully typed — ships with py.typed marker; works with mypy strict mode

Part of a multi-language monorepo — identical algorithm and conformance tests across Rust, Python, and TypeScript.

Installation

pip install grate-limiter

Requirements: Python 3.10+. No external runtime dependencies.

Quick Start

from grate_limiter import (
    GrateLimiter, EngineConfig,
    ProviderConfig, CapabilityConfig, CapabilityProvider,
    QuotaConfig, Observation, Usage, Outcome,
    Dimension, Window, StatusClass,
)

# Create the engine
engine = GrateLimiter(EngineConfig())

# Register providers with their rate-limit quotas
engine.upsert_provider(ProviderConfig(
    name="openai",
    quotas=[QuotaConfig(dimension=Dimension.REQUESTS, limit=5000, window=Window.MINUTE)],
    priority=10,
    cooldown_seconds=30,
))

engine.upsert_provider(ProviderConfig(
    name="anthropic",
    quotas=[QuotaConfig(dimension=Dimension.REQUESTS, limit=3000, window=Window.MINUTE)],
    priority=8,
    cooldown_seconds=30,
))

# Register a capability (logical operation served by multiple providers)
engine.upsert_capability(CapabilityConfig(
    name="chat-completion",
    providers=[
        CapabilityProvider(provider="openai", priority=10),
        CapabilityProvider(provider="anthropic", priority=8),
    ],
))

# Select the best provider for the next request
decision = engine.select("chat-completion")
print(f"Use: {decision.provider} (score: {decision.score:.2f})")
# → "Use: openai (score: 0.94)"

# After the request completes, report the outcome
engine.observe(Observation(
    provider="openai",
    capability="chat-completion",
    usage=Usage(requests=1, tokens=1200),
    outcome=Outcome(status=StatusClass.SUCCESS, latency_ms=830),
))

Core Concepts

Providers and Capabilities

A provider is a named upstream service (e.g. "openai", "anthropic") with associated rate-limit quotas. A capability is a logical operation (e.g. "chat-completion", "embeddings") that can be served by one or more providers.

# Provider with multiple quota dimensions
engine.upsert_provider(ProviderConfig(
    name="openai-gpt4",
    quotas=[
        QuotaConfig(dimension=Dimension.REQUESTS, limit=500, window=Window.MINUTE),
        QuotaConfig(dimension=Dimension.TOKENS, limit=150_000, window=Window.MINUTE),
        QuotaConfig(dimension=Dimension.CONCURRENCY, limit=20),
    ],
    priority=10,
    cooldown_seconds=60,
))

Scoring Algorithm

Every call to select() scores all eligible providers using a weighted formula:

score = quota_score  × 0.40
      + health_score × 0.35
      + priority_score × 0.20
      + latency_score  × 0.05

The provider with the highest score wins. Providers in cooldown or below minimum health are excluded entirely.

Health Tracking

Health decays with each failure using an Exponential Weighted Moving Average (EWMA) and recovers gradually with successes. Providers that hit consecutive failures enter a cooldown period and are bypassed until it expires.

# Observe a rate-limit response — health decays, cooldown may trigger
engine.observe(Observation(
    provider="openai",
    outcome=Outcome(status=StatusClass.RATE_LIMITED, latency_ms=200),
    usage=Usage(requests=1),
))

# Query provider state
in_cooldown = engine.provider_in_cooldown("openai")   # bool
health = engine.provider_health("openai")              # 0.0–1.0 or None
remaining = engine.remaining_quota("openai", Dimension.REQUESTS)  # int or None

Quota Strategies

Strategy	When to use
`Dimension.REQUESTS`	Per-request rate limits (RPM / RPD)
`Dimension.TOKENS`	Token-based limits (TPM / TPD)
`Dimension.CONCURRENCY`	Max simultaneous in-flight requests

Deterministic Testing

Use MockClock to write fully deterministic tests — no real timers, no time.sleep():

import pytest
from grate_limiter import (
    GrateLimiter, EngineConfig, MockClock,
    ProviderConfig, CapabilityConfig, CapabilityProvider,
    QuotaConfig, Observation, Usage, Outcome,
    Dimension, Window, StatusClass, NoAvailableProviders,
)

def test_failover_after_rate_limit():
    clock = MockClock()
    engine = GrateLimiter(EngineConfig(clock=clock))

    engine.upsert_provider(ProviderConfig(
        name="primary",
        quotas=[QuotaConfig(dimension=Dimension.REQUESTS, limit=2, window=Window.MINUTE)],
        priority=10, cooldown_seconds=30,
    ))
    engine.upsert_provider(ProviderConfig(
        name="backup",
        quotas=[QuotaConfig(dimension=Dimension.REQUESTS, limit=100, window=Window.MINUTE)],
        priority=5, cooldown_seconds=30,
    ))
    engine.upsert_capability(CapabilityConfig(
        name="api",
        providers=[
            CapabilityProvider(provider="primary", priority=10),
            CapabilityProvider(provider="backup", priority=5),
        ],
    ))

    # Exhaust primary with rate-limited responses
    for _ in range(3):
        clock.advance_ms(1000)
        engine.observe(Observation(
            provider="primary",
            outcome=Outcome(status=StatusClass.RATE_LIMITED, latency_ms=50),
            usage=Usage(requests=1),
        ))

    # Should now route to backup
    decision = engine.select("api")
    assert decision.provider == "backup"

    # After cooldown expires, primary is eligible again
    clock.advance_secs(60)
    recovered = engine.select("api")
    assert recovered.provider == "primary"

API Reference

`GrateLimiter`

class GrateLimiter:
    def __init__(self, config: EngineConfig | None = None) -> None

    # Register or update a provider and its quota configuration
    def upsert_provider(self, config: ProviderConfig) -> None

    # Register or update a capability and its provider mappings
    def upsert_capability(self, config: CapabilityConfig) -> None

    # Select the best provider for a capability.
    # Raises UnknownCapability if capability is not registered.
    # Raises NoAvailableProviders if all providers are in cooldown.
    def select(self, capability: str) -> Decision

    # Record the outcome of a completed request.
    # Raises UnknownProvider if provider is not registered.
    def observe(self, obs: Observation) -> None

    # Query provider state
    def provider_health(self, provider: str) -> float | None
    def provider_in_cooldown(self, provider: str) -> bool
    def remaining_quota(self, provider: str, dimension: Dimension) -> int | None

`Decision`

@dataclass
class Decision:
    provider: str               # Chosen provider name
    score: float                # Composite score (0.0–1.0)
    alternatives: list[Alternative]   # Other eligible providers, ranked
    breakdown: ScoreBreakdown         # Score components for observability

`EngineConfig`

@dataclass
class EngineConfig:
    clock: Clock | None = None           # Override for testing (use MockClock)
    scoring: ScoringWeights | None = None
    health: HealthConfig | None = None

Advanced Configuration

from grate_limiter import GrateLimiter, EngineConfig, ScoringWeights, HealthConfig

engine = GrateLimiter(EngineConfig(
    scoring=ScoringWeights(
        quota=0.50,     # Weight quota health more heavily
        health=0.30,
        priority=0.15,
        latency=0.05,
    ),
    health=HealthConfig(
        ewma_alpha=0.3,                  # Faster decay on failures
        cooldown_threshold=0.2,          # Enter cooldown below 20% health
        min_health_for_selection=0.1,    # Exclude below 10%
        max_cooldown_secs=300,           # Cap cooldown at 5 minutes
    ),
))

Error Handling

from grate_limiter import UnknownCapability, UnknownProvider, NoAvailableProviders

try:
    decision = engine.select("chat-completion")
    # use decision...
except NoAvailableProviders:
    # All providers are in cooldown or unhealthy
    # Implement circuit-breaker or return 503
    raise
except UnknownCapability:
    # Capability was never registered
    raise

Contributing

Issues and pull requests are welcome at github.com/dev-kasibhatla/grate-limiter.

Rust crate — the original, highest-performance implementation
TypeScript package — identical algorithm for browser and Node.js
GitHub repository — monorepo with all three implementations

License

Apache-2.0 © Aditya Kasibhatla

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 13, 2026

0.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grate_limiter-0.1.1.tar.gz (17.2 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grate_limiter-0.1.1-py3-none-any.whl (16.4 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file grate_limiter-0.1.1.tar.gz.

File metadata

Download URL: grate_limiter-0.1.1.tar.gz
Upload date: May 13, 2026
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grate_limiter-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`8d9a7963343709a8af3a5c910dbce2e68df832d1a6e23aa9a4185cf3f077e5f9`
MD5	`ba42c0b0d24f1fad7b4313e3f1c7d87e`
BLAKE2b-256	`499f08883acf13c05061252a523d0491dd2df1c9af1d84105bab72d71451774b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grate_limiter-0.1.1.tar.gz:

Publisher: release.yml on dev-kasibhatla/grate-limiter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grate_limiter-0.1.1.tar.gz
- Subject digest: 8d9a7963343709a8af3a5c910dbce2e68df832d1a6e23aa9a4185cf3f077e5f9
- Sigstore transparency entry: 1523309439
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: dev-kasibhatla/grate-limiter@ce80c17031007a85a53f3254401c4932b624a7a4
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/dev-kasibhatla
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ce80c17031007a85a53f3254401c4932b624a7a4
- Trigger Event: push

File details

Details for the file grate_limiter-0.1.1-py3-none-any.whl.

File metadata

Download URL: grate_limiter-0.1.1-py3-none-any.whl
Upload date: May 13, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grate_limiter-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a6a06795066164698e29c6ad049b6f60f54b8a31378298222a07b3ac3bc4130`
MD5	`2720eeda46a369a359296d9f83ce33b3`
BLAKE2b-256	`ba5e55283a96bf8a06c15749b409e1b717ac13a877d6510e5235735dd1009b5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grate_limiter-0.1.1-py3-none-any.whl:

Publisher: release.yml on dev-kasibhatla/grate-limiter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grate_limiter-0.1.1-py3-none-any.whl
- Subject digest: 7a6a06795066164698e29c6ad049b6f60f54b8a31378298222a07b3ac3bc4130
- Sigstore transparency entry: 1523309448
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: dev-kasibhatla/grate-limiter@ce80c17031007a85a53f3254401c4932b624a7a4
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/dev-kasibhatla
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ce80c17031007a85a53f3254401c4932b624a7a4
- Trigger Event: push

grate-limiter 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

grate-limiter

Installation

Quick Start

Core Concepts

Providers and Capabilities

Scoring Algorithm

Health Tracking

Quota Strategies

Deterministic Testing

API Reference

GrateLimiter

Decision

EngineConfig

Advanced Configuration

Error Handling

Contributing

Related

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`GrateLimiter`

`Decision`

`EngineConfig`