Skip to main content

High-performance rate limiter engine for MCP Gateway

Project description

Rate Limiter Plugin

Author: ContextForge Contributors

Enforces rate limits per user, tenant, and tool across tool_pre_invoke and prompt_pre_fetch hooks. Supports pluggable counting algorithms (fixed window, sliding window, token bucket), an in-process memory backend (single-instance), and a Redis backend (shared across all gateway instances).

Runtime Requirements

This plugin depends on cpex>=0.1.0rc1,<0.2 and imports hook models from cpex.framework. The compiled Rust extension is mandatory; there is no Python fallback implementation.

Hooks

Hook When it runs
tool_pre_invoke Before every tool call — checks by_user, by_tenant, by_tool
prompt_pre_fetch Before every prompt fetch — checks by_user, by_tenant, by_tool

If any configured dimension is exceeded, the plugin returns a violation with HTTP 429. All requests include X-RateLimit-* headers. The most restrictive active dimension is surfaced (e.g. if both user and tenant limits are active, the one closest to exhaustion is reported).

Configuration

- name: RateLimiterPlugin
  kind: cpex_rate_limiter.rate_limiter.RateLimiterPlugin
  hooks:
    - prompt_pre_fetch
    - tool_pre_invoke
  mode: enforce          # enforce | permissive | disabled
  config:
    by_user: "30/m"      # per-user limit across all tools
    by_tenant: "300/m"   # shared limit across all users in a tenant
    by_tool:             # per-tool overrides (applied on top of by_user)
      search: "10/m"
      summarise: "5/m"

    # Algorithm — choose one (default: fixed_window)
    algorithm: "fixed_window"    # fixed_window | sliding_window | token_bucket

    # Backend — choose one
    backend: "memory"    # default: single-process, resets on restart
    # backend: "redis"   # shared across all gateway instances

    # Redis options (required when backend: redis)
    redis_url: "redis://redis:6379/0"
    redis_key_prefix: "rl"

    # Backend failure policy (default: "open" — fail-open)
    # "closed" — return HTTP 503 BACKEND_UNAVAILABLE violation when the
    # backend can't be reached (correctness over availability)
    fail_mode: "open"

Configuration reference

Field Type Default Description
by_user string null Per-user rate limit, e.g. "60/m"
by_tenant string null Per-tenant rate limit, e.g. "600/m"
by_tool dict {} Per-tool overrides, e.g. {"search": "10/m"}
algorithm string "fixed_window" Counting algorithm: "fixed_window", "sliding_window", or "token_bucket"
backend string "memory" "memory" or "redis"
redis_url string null Redis connection URL (required when backend: redis). Use rediss:// for TLS.
redis_key_prefix string "rl" Prefix for all Redis keys
fail_mode string "open" Behaviour when the backend can't be reached: "open" allows the request through, "closed" blocks with a 503 BACKEND_UNAVAILABLE violation
redis_ssl_ca_certs string null Path to a PEM CA bundle to use instead of the OS trust store. Requires rediss:// URL.
redis_ssl_certfile string null Path to a PEM client certificate for mTLS. Must be paired with redis_ssl_keyfile.
redis_ssl_keyfile string null Path to a PEM private key for mTLS. Must be paired with redis_ssl_certfile.
redis_ssl_check_hostname bool true When false, ALL TLS certificate validation is disabled (see security note below).

Rate string format: "<count>/<unit>" where unit is s/sec/second, m/min/minute, or h/hr/hour. Malformed strings raise ValueError at startup. Counts above 1_000_000 are rejected as a sanity ceiling — anything higher is almost certainly a misconfig or a denial-of-service vector against the memory backend.

Unknown config keys (e.g. a typo like redis_ur) are logged at WARN at engine init alongside the accepted-key list, instead of being silently ignored.

Invalid fail_mode values (e.g. "clsoed") are logged at WARN and fall back to "open" so an operator's typo surfaces instead of silently disabling the hardening they asked for.

Omitting a dimension (e.g. no by_tenant) means that dimension is unlimited — no counter is tracked for it.

Response headers

Every request (allowed or blocked) includes:

Header Description
X-RateLimit-Limit Configured limit for the most restrictive active dimension
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the current window resets
Retry-After Seconds until the window resets (blocked requests only)

Algorithms

Three counting algorithms are available, selected via the algorithm config field.

Algorithm Config value Best for Trade-off
Fixed window fixed_window General use, lowest overhead Up to 2× the limit at window boundaries
Sliding window sliding_window Smooth enforcement, no boundary burst Higher memory: stores one timestamp per request per key
Token bucket token_bucket Bursty workloads — allows short spikes up to capacity Slightly higher Redis overhead: stores {tokens, last_refill} hash per key

Fixed window (default)

Counts requests in a fixed time slot (e.g. "minute 14:03"). Resets at the slot boundary. Simple and fast. The 2× burst at a boundary (N requests at the end of slot T, N requests at the start of T+1) is a known trade-off; use by_user with headroom if this matters.

Sliding window

Stores a timestamp for every request in the current window. At each check, expired timestamps are discarded and the remaining count is compared against the limit. Prevents boundary bursts entirely. Memory usage grows with request volume — roughly one float per request per active key.

Token bucket

Each identity (user, tenant, tool) has a bucket that holds up to count tokens. Tokens refill at a steady rate of count/window. A request consumes one token. Bursts up to the bucket capacity are allowed; sustained rate above count/window is rejected. Useful for APIs where short spikes are acceptable but sustained overload is not.

Redis support: token_bucket with backend: redis is fully supported. The plugin stores {tokens, last_refill} in a Redis hash per key and uses an atomic Lua script to refill and consume tokens in a single round-trip — the same pattern as the other two algorithms. This means token_bucket enforces a true cluster-wide limit in multi-instance deployments.

Backends

Memory backend (default, single-instance only)

  • Counters are stored in a process-local MemoryStore (Rust, per-key RwLock — no single global lock)
  • An amortized sweep evicts expired keys every ~128 calls — for fixed_window, keys are evicted once the window elapses; for sliding_window, keys with empty timestamp deques are evicted; for token_bucket, keys inactive for >1 hour are evicted
  • Limitation: state is not shared across processes or hosts. In a multi-instance deployment (e.g. 3 gateway instances behind nginx), each instance tracks its own counter — the effective limit is N × configured_limit

Redis backend

  • fixed_window: atomic Lua INCR+EXPIRE — one Redis round-trip per check, no race condition
  • sliding_window: atomic Lua ZADD+ZREMRANGEBYSCORE+ZCARD+EXPIRE — one round-trip, no race condition
  • token_bucket: atomic Lua script — reads {tokens, last_refill} hash, refills proportionally, consumes 1 token, writes back — one round-trip, no race condition
  • All gateway instances share the same counter — the configured limit is the true cluster-wide limit
  • Requires redis_url to be set
  • Backend failure policy is governed by fail_mode:
    • "open" (default) — the request is allowed through without rate limiting. Availability over correctness; an infrastructure failure must never block legitimate traffic. Operators should monitor for rate-limiter error logs and treat them as high-priority alerts.
    • "closed" — the request is blocked with a PluginViolation (code BACKEND_UNAVAILABLE, HTTP 503, Retry-After: 1). Correctness over availability; pick this when a failed rate-limit check is less acceptable than a brief outage.

Multi-instance deployment (important): The memory backend is local to a single gateway instance — rate limit counters are not shared across replicas. For multi-instance deployments (e.g., behind nginx or on OpenShift with multiple gateway pods), always use backend: redis to ensure rate limits are enforced correctly across all instances.

Redis TLS configuration

Use rediss:// (double-s) in redis_url to enable TLS. Three levels of TLS hardening are supported:

OS trust store (default for rediss://) — no extra config; Redis's CA must be signed by a CA in the system certificate store:

config:
  backend: redis
  redis_url: "rediss://redis:6380/0"
  by_user: "60/m"

Custom CA bundle — use when your Redis server uses a private CA not in the OS trust store:

config:
  backend: redis
  redis_url: "rediss://redis:6380/0"
  redis_ssl_ca_certs: "/etc/certs/my-ca.pem"
  by_user: "60/m"

Mutual TLS (mTLS) — present a client certificate so Redis can authenticate the plugin:

config:
  backend: redis
  redis_url: "rediss://redis:6380/0"
  redis_ssl_ca_certs: "/etc/certs/ca.pem"
  redis_ssl_certfile: "/etc/certs/client.pem"
  redis_ssl_keyfile: "/etc/certs/client-key.pem"
  by_user: "60/m"

Security note — redis_ssl_check_hostname: false: Due to the underlying redis client API surface, setting this to false disables all TLS certificate validation (both CA chain and hostname), not only hostname verification. A WARN log is emitted at startup. This option is intended only for isolated environments such as local development or integration test rigs. In production, ensure your certificate's CN or SAN matches the hostname instead.

All TLS file paths are validated at plugin init time: missing files and malformed PEM content are surfaced as startup errors rather than at the first request.

Note: The REDIS_SSL_* environment variables used by some Redis clients have no effect on this plugin; use the config keys above.

Tenant-scoped Redis key layout

When the plugin context carries a tenant_id, every dimension key is prefixed with it so counters are isolated per tenant:

rl:{tenant_id}:user:{email}:{window_seconds}
rl:{tenant_id}:tenant:{tenant_id}:{window_seconds}
rl:{tenant_id}:tool:{tool_name}:{window_seconds}

When tenant_id is absent (single-tenant deployments), the prefix is omitted and keys revert to the pre-tenant-scoping layout (rl:user:{email}:{window}), so single-tenant behaviour is unchanged.

Upgrade note: the first deploy of the tenant-scoping change causes counters under rl:user:* / rl:tool:* to be orphaned while new writes land at rl:{tenant}:user:*. Counters effectively reset once for all in-flight windows — non-event for typical second/minute windows.

Examples

Single-instance (default config)

config:
  by_user: "60/m"
  by_tenant: "600/m"

Multi-instance with Redis

config:
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"
  by_tenant: "3000/m"
  by_tool:
    search: "10/m"

Sliding window (no boundary bursts)

config:
  algorithm: "sliding_window"
  by_user: "30/m"
  by_tenant: "300/m"

Token bucket — memory backend (default)

config:
  algorithm: "token_bucket"
  by_user: "30/m"   # bucket holds 30 tokens, refills at 30/min

Token bucket — Redis backend (multi-instance)

config:
  algorithm: "token_bucket"
  backend: "redis"
  redis_url: "redis://redis:6379/0"
  by_user: "30/m"

Permissive mode (observe without blocking)

mode: permissive
config:
  by_user: "60/m"

In permissive mode the plugin records violations and emits X-RateLimit-* headers but does not block requests. Useful for baselining traffic before switching to enforce.

Lifecycle

The plugin participates in the plugin manager's lifecycle contract:

  • async def initialize(self) — invoked once when the plugin manager constructs the plugin. Logs one INFO record naming the active backend (memory / redis).
  • async def shutdown(self) — invoked when the plugin manager tears the plugin down (runtime disable, re-instantiation after a config change). Releases backend-held resources — specifically, drops the Rust core's cached Redis multiplexed connection and the SCRIPT LOAD SHA cache. In-flight requests already hold their own clones of the connection and remain valid; the cached reference is replaced on the next request.

Without shutdown, the cached Redis connection would leak across plugin re-instantiation, producing connection churn on the server.

Limitations

Limitation Severity Status
Memory backend not shared across processes HIGH Use Redis backend for multi-instance deployments
Fixed window allows up to 2× limit at window boundary LOW Use sliding_window algorithm, or use by_user with headroom
by_tool matching is case-sensitive LOW Fixed — tool names are normalised with .strip().lower()
Whitespace-only user identity bypasses anonymous bucket LOW Fixed — _extract_user_identity strips whitespace and falls back to 'anonymous'
No per-server limits (server_id dimension missing) LOW Not implemented
No config hot-reload — rate string changes require restart LOW Not implemented

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cpex_rate_limiter-0.1.3.tar.gz (138.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cpex_rate_limiter-0.1.3-cp311-abi3-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.11+Windows x86-64

cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_s390x.whl (1.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ s390x

cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_ppc64le.whl (1.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ppc64le

cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_aarch64.whl (1.4 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ ARM64

cpex_rate_limiter-0.1.3-cp311-abi3-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file cpex_rate_limiter-0.1.3.tar.gz.

File metadata

  • Download URL: cpex_rate_limiter-0.1.3.tar.gz
  • Upload date:
  • Size: 138.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cpex_rate_limiter-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f4c6cbab32472d1f5aa1bc6b470fb0cf676842c164b66c7d1da2ac4560e17939
MD5 b04df417716bb20e81d18e0e57d9ad13
BLAKE2b-256 78526879b0fbf4fdd7d2c6f9113db105cb27e4692828aa7ee8b46df124980dc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3.tar.gz:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a9ad3ad2dfbe6b4c72f90027677c05b0e1834f3d9f6cc6158ec5b8ace110e51c
MD5 5de60372226e369faf25ca044a5f137d
BLAKE2b-256 f47d88588768bc2573b7cd4dfd8aa602fcc230b0d57b9452d0866fc1f6d16d54

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-win_amd64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d5db1c0853a876c3bc645ef14ed402c96f78ec65846429bb1178887999991306
MD5 9075952c943e643223e810376940bc51
BLAKE2b-256 f6ab31d15cd8997500245a3c2515da2bdefc928bf099c90e4bd6a27a1511d775

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_s390x.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_s390x.whl
Algorithm Hash digest
SHA256 33c143e5b2bd9ca67321415e0e15e2ace796e81a087b2446ec3316154c403c73
MD5 9e42e28abf9fe208788021e18da761dc
BLAKE2b-256 13a94e32a1f199e4e8a83d807096806cccdd10c1299e2e049d22c64486771641

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_s390x.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 331a8650900d1154b6045a451ef236e91d5ce4a223c85985184f5665e98be6a1
MD5 78b26dc1f1ecfecd1d1f69abbd7ef0b0
BLAKE2b-256 fa80080bd814a4fb76dbe2f5c571064b23f869b3bf4bac5a9d900ccdd6fdc4c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_ppc64le.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 5ea3ba84aebd53d85490b372d13858e4ada8edf76f7ad729f466eb438b69811c
MD5 87093a8876edfc9b25f7c7a7a3e1cbe4
BLAKE2b-256 48af49fc2af3616820abbc0baaf88be1d14cd261ec24e96bdcc9e1ee2c31d7a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-manylinux_2_34_aarch64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cpex_rate_limiter-0.1.3-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cpex_rate_limiter-0.1.3-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1582331c639ccb93c020e0eb2f79ba7629b58db747f7ece3799b9f5e595cbe85
MD5 5d0d8aedde0cd672f5e4ae666183e74c
BLAKE2b-256 d7285a609665d574a7321a53f1987a2bd6469e16f0c9e3d3a6202cf6332449c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cpex_rate_limiter-0.1.3-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release-rust-python-package.yaml on IBM/cpex-plugins

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page