Skip to main content

High-performance rate limiter engine for MCP Gateway

Project description

Rate Limiter Plugin

Author: Mihai Criveti

Enforces rate limits per user, tenant, and tool across tool_pre_invoke and prompt_pre_fetch hooks. Supports pluggable counting algorithms (fixed window, sliding window, token bucket), an in-process memory backend (single-instance), and a Redis backend (shared across all gateway instances).

Hooks

Hook When it runs
tool_pre_invoke Before every tool call — checks by_user, by_tenant, by_tool
prompt_pre_fetch Before every prompt fetch — checks by_user, by_tenant, by_tool

If any configured dimension is exceeded, the plugin returns a violation with HTTP 429. All requests include X-RateLimit-* headers. The most restrictive active dimension is surfaced (e.g. if both user and tenant limits are active, the one closest to exhaustion is reported).

Configuration

- name: RateLimiterPlugin
  kind: cpex_rate_limiter.rate_limiter.RateLimiterPlugin
  hooks:
    - prompt_pre_fetch
    - tool_pre_invoke
  mode: enforce          # enforce | permissive | disabled
  config:
    by_user: "30/m"      # per-user limit across all tools
    by_tenant: "300/m"   # shared limit across all users in a tenant
    by_tool:             # per-tool overrides (applied on top of by_user)
      search: "10/m"
      summarise: "5/m"

    # Algorithm — choose one (default: fixed_window)
    algorithm: "fixed_window"    # fixed_window | sliding_window | token_bucket

    # Backend — choose one
    backend: "memory"    # default: single-process, resets on restart
    # backend: "redis"   # shared across all gateway instances

    # Redis options (required when backend: redis)
    redis_url: "redis://redis:6379/0"
    redis_key_prefix: "rl"

Configuration reference

Field Type Default Description
by_user string null Per-user rate limit, e.g. "60/m"
by_tenant string null Per-tenant rate limit, e.g. "600/m"
by_tool dict {} Per-tool overrides, e.g. {"search": "10/m"}
algorithm string "fixed_window" Counting algorithm: "fixed_window", "sliding_window", or "token_bucket"
backend string "memory" "memory" or "redis"
redis_url string null Redis connection URL (required when backend: redis)
redis_key_prefix string "rl" Prefix for all Redis keys

Rate string format: "<count>/<unit>" where unit is s/sec/second, m/min/minute, or h/hr/hour. Malformed strings raise ValueError at startup.

Omitting a dimension (e.g. no by_tenant) means that dimension is unlimited — no counter is tracked for it.

Response headers

Every request (allowed or blocked) includes:

Header Description
X-RateLimit-Limit Configured limit for the most restrictive active dimension
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the current window resets
Retry-After Seconds until the window resets (blocked requests only)

Algorithms

Three counting algorithms are available, selected via the algorithm config field.

Algorithm Config value Best for Trade-off
Fixed window fixed_window General use, lowest overhead Up to 2× the limit at window boundaries
Sliding window sliding_window Smooth enforcement, no boundary burst Higher memory: stores one timestamp per request per key
Token bucket token_bucket Bursty workloads — allows short spikes up to capacity Slightly higher Redis overhead: stores {tokens, last_refill} hash per key

Fixed window (default)

Counts requests in a fixed time slot (e.g. "minute 14:03"). Resets at the slot boundary. Simple and fast. The 2× burst at a boundary (N requests at the end of slot T, N requests at the start of T+1) is a known trade-off; use by_user with headroom if this matters.

Sliding window

Stores a timestamp for every request in the current window. At each check, expired timestamps are discarded and the remaining count is compared against the limit. Prevents boundary bursts entirely. Memory usage grows with request volume — roughly one float per request per active key.

Token bucket

Each identity (user, tenant, tool) has a bucket that holds up to count tokens. Tokens refill at a steady rate of count/window. A request consumes one token. Bursts up to the bucket capacity are allowed; sustained rate above count/window is rejected. Useful for APIs where short spikes are acceptable but sustained overload is not.

Redis support: token_bucket with backend: redis is fully supported. The plugin stores {tokens, last_refill} in a Redis hash per key and uses an atomic Lua script to refill and consume tokens in a single round-trip — the same pattern as the other two algorithms. This means token_bucket enforces a true cluster-wide limit in multi-instance deployments.

Backends

Memory backend (default, single-instance only)

  • Counters are stored in a process-local MemoryStore (Rust, per-key RwLock — no single global lock)
  • An amortized sweep evicts expired keys every ~128 calls — for fixed_window, keys are evicted once the window elapses; for sliding_window, keys with empty timestamp deques are evicted; for token_bucket, keys inactive for >1 hour are evicted
  • Limitation: state is not shared across processes or hosts. In a multi-instance deployment (e.g. 3 gateway instances behind nginx), each instance tracks its own counter — the effective limit is N × configured_limit

Redis backend

  • fixed_window: atomic Lua INCR+EXPIRE — one Redis round-trip per check, no race condition
  • sliding_window: atomic Lua ZADD+ZREMRANGEBYSCORE+ZCARD+EXPIRE — one round-trip, no race condition
  • token_bucket: atomic Lua script — reads {tokens, last_refill} hash, refills proportionally, consumes 1 token, writes back — one round-trip, no race condition
  • All gateway instances share the same counter — the configured limit is the true cluster-wide limit
  • Requires redis_url to be set
  • If Redis is unavailable, the plugin fails open — the request is allowed through without rate limiting. This is a deliberate design choice: an infrastructure failure must never block legitimate traffic. Operators should monitor for rate-limiter error logs and treat them as high-priority alerts

Multi-instance deployment (important): The memory backend is local to a single gateway instance — rate limit counters are not shared across replicas. For multi-instance deployments (e.g., behind nginx or on OpenShift with multiple gateway pods), always use backend: redis to ensure rate limits are enforced correctly across all instances.

Examples

Single-instance (default config)

config:
  by_user: "60/m"
  by_tenant: "600/m"

Multi-instance with Redis

config:
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"
  by_tenant: "3000/m"
  by_tool:
    search: "10/m"

Sliding window (no boundary bursts)

config:
  algorithm: "sliding_window"
  by_user: "30/m"
  by_tenant: "300/m"

Token bucket — memory backend (default)

config:
  algorithm: "token_bucket"
  by_user: "30/m"   # bucket holds 30 tokens, refills at 30/min

Token bucket — Redis backend (multi-instance)

config:
  algorithm: "token_bucket"
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"

Permissive mode (observe without blocking)

mode: permissive
config:
  by_user: "60/m"

In permissive mode the plugin records violations and emits X-RateLimit-* headers but does not block requests. Useful for baselining traffic before switching to enforce.

Limitations

Limitation Severity Status
Memory backend not shared across processes HIGH Use Redis backend for multi-instance deployments
Fixed window allows up to 2× limit at window boundary LOW Use sliding_window algorithm, or use by_user with headroom
by_tool matching is case-sensitive LOW Fixed — tool names are normalised with .strip().lower()
Whitespace-only user identity bypasses anonymous bucket LOW Fixed — _extract_user_identity strips whitespace and falls back to 'anonymous'
No per-server limits (server_id dimension missing) LOW Not implemented
No config hot-reload — rate string changes require restart LOW Not implemented

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_rate_limiter-0.0.2.tar.gz (65.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_rate_limiter-0.0.2-cp311-abi3-win_amd64.whl (689.3 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_x86_64.whl (727.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_s390x.whl (805.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_ppc64le.whl (787.8 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_aarch64.whl (688.1 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_rate_limiter-0.0.2-cp311-abi3-macosx_11_0_arm64.whl (663.7 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_rate_limiter-0.0.2.tar.gz.

File metadata

  • Download URL: cpex_rate_limiter-0.0.2.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_rate_limiter-0.0.2.tar.gz
Algorithm Hash digest
SHA256 20652585d736ac276d47e3702eb664f3c81116a75e7908f9dfcb35a704741ee8
MD5 243c3f61c314d2318f049702bd72fdaa
BLAKE2b-256 5b3aff004bd20c5b7ab71cb94d9dae44f2d9e2f71c8da1cbbb60f7fd7ed9886e

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2.tar.gz:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1248481db12208c964007df9cd68a770d23c355796eafb7e41aa0cf545b143de
MD5 418298358d3ac7627e2d52d43c3d4912
BLAKE2b-256 577e2bba8261ed01b2d34947ba9c54a42b8ff241bbc46a65c3f8c28eb2465e7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-win_amd64.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2f3ff3474a22f9db0a4935fae187ca3208bdb283e67c6f743bedc9167845551a
MD5 bba43873187de320dace017d06f6b607
BLAKE2b-256 8c38c96eb069439d0d4d7acc43d85b94cfba1fc5771d8ed9997a1d1469b8ac57

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 683ba91c96970a2718d30bb6fb7e5c87aed25e3edb103f35b86b463d2680af48
MD5 6e48024fba44d92d5160843f71e3f918
BLAKE2b-256 2d3eb5af42fe26eb978eba42a123bc49d24a429c76ed25b90483133a88c48f6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 050bece142dafcb3932bc2a88a9424af90ceda385e06735d377a91467916b165
MD5 35a022bd746d49d2b15ff0232d704f13
BLAKE2b-256 2a97e4d7fb1239efd7e2b88f1e5c0126d4f0b6309281a69fa9de8e4c0adc9f1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 fb245c975835d9d8e7061b7f376767342dc72440208796949c16327ba18b4fe4
MD5 ec45255a66eb642c27a180046fc30d13
BLAKE2b-256 346ddf10df04e5cfe65b71695f296e1fedcfa11513a1c27d33f35e35eaa6521b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f959d98789eb7e74fba217a9a4051ebe293a05749a83cabfaf05d14aff7e8ba
MD5 d8612507aafd978502f9a87dadd6dd19
BLAKE2b-256 8bc43953fb4402c93a9f5db7d7a86c67ebf4da0be867e42169e228fc49b8fa6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: pypi-rate-limiter.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page