Skip to main content

High-performance rate limiter engine for MCP Gateway

Project description

Rate Limiter Plugin

Author: ContextForge Contributors

Enforces rate limits per user, tenant, and tool across tool_pre_invoke and prompt_pre_fetch hooks. Supports pluggable counting algorithms (fixed window, sliding window, token bucket), an in-process memory backend (single-instance), and a Redis backend (shared across all gateway instances).

Hooks

Hook When it runs
tool_pre_invoke Before every tool call — checks by_user, by_tenant, by_tool
prompt_pre_fetch Before every prompt fetch — checks by_user, by_tenant, by_tool

If any configured dimension is exceeded, the plugin returns a violation with HTTP 429. All requests include X-RateLimit-* headers. The most restrictive active dimension is surfaced (e.g. if both user and tenant limits are active, the one closest to exhaustion is reported).

Configuration

- name: RateLimiterPlugin
  kind: cpex_rate_limiter.rate_limiter.RateLimiterPlugin
  hooks:
    - prompt_pre_fetch
    - tool_pre_invoke
  mode: enforce          # enforce | permissive | disabled
  config:
    by_user: "30/m"      # per-user limit across all tools
    by_tenant: "300/m"   # shared limit across all users in a tenant
    by_tool:             # per-tool overrides (applied on top of by_user)
      search: "10/m"
      summarise: "5/m"

    # Algorithm — choose one (default: fixed_window)
    algorithm: "fixed_window"    # fixed_window | sliding_window | token_bucket

    # Backend — choose one
    backend: "memory"    # default: single-process, resets on restart
    # backend: "redis"   # shared across all gateway instances

    # Redis options (required when backend: redis)
    redis_url: "redis://redis:6379/0"
    redis_key_prefix: "rl"

Configuration reference

Field Type Default Description
by_user string null Per-user rate limit, e.g. "60/m"
by_tenant string null Per-tenant rate limit, e.g. "600/m"
by_tool dict {} Per-tool overrides, e.g. {"search": "10/m"}
algorithm string "fixed_window" Counting algorithm: "fixed_window", "sliding_window", or "token_bucket"
backend string "memory" "memory" or "redis"
redis_url string null Redis connection URL (required when backend: redis)
redis_key_prefix string "rl" Prefix for all Redis keys

Rate string format: "<count>/<unit>" where unit is s/sec/second, m/min/minute, or h/hr/hour. Malformed strings raise ValueError at startup.

Omitting a dimension (e.g. no by_tenant) means that dimension is unlimited — no counter is tracked for it.

Response headers

Every request (allowed or blocked) includes:

Header Description
X-RateLimit-Limit Configured limit for the most restrictive active dimension
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the current window resets
Retry-After Seconds until the window resets (blocked requests only)

Algorithms

Three counting algorithms are available, selected via the algorithm config field.

Algorithm Config value Best for Trade-off
Fixed window fixed_window General use, lowest overhead Up to 2× the limit at window boundaries
Sliding window sliding_window Smooth enforcement, no boundary burst Higher memory: stores one timestamp per request per key
Token bucket token_bucket Bursty workloads — allows short spikes up to capacity Slightly higher Redis overhead: stores {tokens, last_refill} hash per key

Fixed window (default)

Counts requests in a fixed time slot (e.g. "minute 14:03"). Resets at the slot boundary. Simple and fast. The 2× burst at a boundary (N requests at the end of slot T, N requests at the start of T+1) is a known trade-off; use by_user with headroom if this matters.

Sliding window

Stores a timestamp for every request in the current window. At each check, expired timestamps are discarded and the remaining count is compared against the limit. Prevents boundary bursts entirely. Memory usage grows with request volume — roughly one float per request per active key.

Token bucket

Each identity (user, tenant, tool) has a bucket that holds up to count tokens. Tokens refill at a steady rate of count/window. A request consumes one token. Bursts up to the bucket capacity are allowed; sustained rate above count/window is rejected. Useful for APIs where short spikes are acceptable but sustained overload is not.

Redis support: token_bucket with backend: redis is fully supported. The plugin stores {tokens, last_refill} in a Redis hash per key and uses an atomic Lua script to refill and consume tokens in a single round-trip — the same pattern as the other two algorithms. This means token_bucket enforces a true cluster-wide limit in multi-instance deployments.

Backends

Memory backend (default, single-instance only)

  • Counters are stored in a process-local MemoryStore (Rust, per-key RwLock — no single global lock)
  • An amortized sweep evicts expired keys every ~128 calls — for fixed_window, keys are evicted once the window elapses; for sliding_window, keys with empty timestamp deques are evicted; for token_bucket, keys inactive for >1 hour are evicted
  • Limitation: state is not shared across processes or hosts. In a multi-instance deployment (e.g. 3 gateway instances behind nginx), each instance tracks its own counter — the effective limit is N × configured_limit

Redis backend

  • fixed_window: atomic Lua INCR+EXPIRE — one Redis round-trip per check, no race condition
  • sliding_window: atomic Lua ZADD+ZREMRANGEBYSCORE+ZCARD+EXPIRE — one round-trip, no race condition
  • token_bucket: atomic Lua script — reads {tokens, last_refill} hash, refills proportionally, consumes 1 token, writes back — one round-trip, no race condition
  • All gateway instances share the same counter — the configured limit is the true cluster-wide limit
  • Requires redis_url to be set
  • If Redis is unavailable, the plugin fails open — the request is allowed through without rate limiting. This is a deliberate design choice: an infrastructure failure must never block legitimate traffic. Operators should monitor for rate-limiter error logs and treat them as high-priority alerts

Multi-instance deployment (important): The memory backend is local to a single gateway instance — rate limit counters are not shared across replicas. For multi-instance deployments (e.g., behind nginx or on OpenShift with multiple gateway pods), always use backend: redis to ensure rate limits are enforced correctly across all instances.

Examples

Single-instance (default config)

config:
  by_user: "60/m"
  by_tenant: "600/m"

Multi-instance with Redis

config:
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"
  by_tenant: "3000/m"
  by_tool:
    search: "10/m"

Sliding window (no boundary bursts)

config:
  algorithm: "sliding_window"
  by_user: "30/m"
  by_tenant: "300/m"

Token bucket — memory backend (default)

config:
  algorithm: "token_bucket"
  by_user: "30/m"   # bucket holds 30 tokens, refills at 30/min

Token bucket — Redis backend (multi-instance)

config:
  algorithm: "token_bucket"
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"

Permissive mode (observe without blocking)

mode: permissive
config:
  by_user: "60/m"

In permissive mode the plugin records violations and emits X-RateLimit-* headers but does not block requests. Useful for baselining traffic before switching to enforce.

Limitations

Limitation Severity Status
Memory backend not shared across processes HIGH Use Redis backend for multi-instance deployments
Fixed window allows up to 2× limit at window boundary LOW Use sliding_window algorithm, or use by_user with headroom
by_tool matching is case-sensitive LOW Fixed — tool names are normalised with .strip().lower()
Whitespace-only user identity bypasses anonymous bucket LOW Fixed — _extract_user_identity strips whitespace and falls back to 'anonymous'
No per-server limits (server_id dimension missing) LOW Not implemented
No config hot-reload — rate string changes require restart LOW Not implemented

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_rate_limiter-0.0.3.tar.gz (63.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_rate_limiter-0.0.3-cp311-abi3-win_amd64.whl (735.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_x86_64.whl (771.2 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_s390x.whl (851.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_ppc64le.whl (832.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_aarch64.whl (731.5 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_rate_limiter-0.0.3-cp311-abi3-macosx_11_0_arm64.whl (703.0 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_rate_limiter-0.0.3.tar.gz.

File metadata

  • Download URL: cpex_rate_limiter-0.0.3.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_rate_limiter-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c2bc35530840bdc98c70a8cb8a12290dc765e757f54e2961644bc6eda4ca99f1
MD5 7e6b77c03d08339d1b8f09bbdcd4e818
BLAKE2b-256 1ad5db5a693ddbfab73770f71d20b6bb78fb5d5625effef81728069194eaa842

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 876df5158f8f5d1c29fc6a7a41c775a4f510ddf3c56f24bc05a15297705b4bd3
MD5 2310b16d998573ddb2f76f077ed04c36
BLAKE2b-256 f2e777a83a27bf681e2abc3b76edacbc379c61d7994b02d436d2b0bfd36003c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d45de2e6f6f8b3cbd0291f5ce81e950846d31751b11d02567dd3c553a771919e
MD5 54f67553318392e896f2ebbac0dc0736
BLAKE2b-256 5ab7e5509dbbfcf8f60d5a4062e78f0bf82f60548787ceede7bc5a272aee34ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 a6f703e5170908c2801ba8641161ad1be454e73a652ce8ee555f56af83418637
MD5 0036f661b529dcde3a3fc2b9aeb898c1
BLAKE2b-256 ebb2fab0c09600dacfca1b0d86e24d1d219527480d82cf8e2e7268ed06cf22b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 f98674f7b609ef107e4129a110d20ae942ecd29fa8dbad39f92fda4ec9ba2271
MD5 c5fb134e772831ac25339fe89c441bb9
BLAKE2b-256 37905c01ce2499a43f9f356e102723e13672a6e591c0015b9fb26c2f7dd5467a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 b6cab31c59eb24c433d56694935be8918916b74513904e3de77e1610c90bf93b
MD5 1a8f7d7b0bc9d1575b18d34d10371e81
BLAKE2b-256 80aaf93985195352d90f40ef57458c2cdd897b1b42b0dae910483bf9b1d76b78

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.0.3-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.0.3-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6ce22f0edaa79687a98fb9ffbce30fe8b147cf931f110de04acd4c3d4a867217
MD5 abcfa6e736dc9c94d53346de67cc397a
BLAKE2b-256 d49ca354b5a9d2d34f72b4aa9a5ad8a9ea32f9811116fb98d9e8f2178465f63f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.0.3-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page