Skip to main content

Production-grade Redis-backed distributed rate limiting library for Python with FastAPI middleware.

Project description

Distributed Rate Limiter

This project is a Redis-backed rate limiting library for Python, implementing three algorithms with atomic distributed guarantees. Designed as a plug-and-play FastAPI middleware that works correctly across multiple application instances.


Table of Contents


Overview

Rate limiters are a fundamental building block of any public API. The crucial challenge in a distributed environment is that multiple instances of the same application must share the rate limit state, a counter stored in a single application instance is invisible to the others, making it trivially bypassable.

This library solves this problem by storing all rate limit state in Redis and executing all check-and-update operations as atomic Lua scripts using Redis EVAL. Because Redis executes Lua scripts atomically, the check-and-increment is a single indivisible operation regardless of how many application instances are running concurrently, so we have no race conditions or double-counting.


Algorithms

Three algorithms are implemented, to represent different points in terms of the accuracy vs. memory trade-off space.

Token Bucket

Maintains a bucket of tokens per client, refilled at a fixed rate. Each request consumes one token, if the bucket is empty the request is rejected. Allows bursting up to the configured capacity.

  • Memory: O(1) per client — stores token count and last refill timestamp
  • Accuracy: Approximate — burst-tolerant by design
  • Best for: APIs where short bursts are acceptable

Sliding Window Log

Stores the exact timestamp of every request in a Redis sorted set. On each request, entries older than the window are removed and the remaining count is compared against the limit.

  • Memory: O(n) per client — stores one entry per request
  • Accuracy: Exact — enforces the limit with perfect precision
  • Best for: APIs requiring strict per-window enforcement regardless of memory cost

Sliding Window Counter

Maintains integer counters for the current and previous fixed windows, using a weighted interpolation to approximate the rolling count: weighted = prev_count * (1 - elapsed/window) + curr_count. O(1) memory with accuracy approaching the sliding window log.

  • Memory: O(1) per client — stores two integer counters
  • Accuracy: Approximate — small error at window boundaries
  • Best for: High-traffic APIs where memory efficiency and accuracy are both important

Architecture

┌─────────────┐     ┌─────────────────────────┐     ┌───────────────────┐
│  Client A   │────▶│                         │────▶│                   │
├─────────────┤     │  RateLimiterMiddleware  │     │       Redis       │
│  Client B   │────▶│                         │     │                   │
├─────────────┤     │  • Extracts client key  │     │    Atomic Lua     │
│  Client C   │────▶│  • Calls algorithm      │◀────│  scripts via EVAL │
└─────────────┘     │  • Returns 200 or 429   │     │                   │
                    └─────────────────────────┘     └───────────────────┘
                               │ Allow
                               ▼
                    ┌─────────────────────────┐
                    │       App Service       │
                    │    (FastAPI routes)     │
                    └─────────────────────────┘

Project Structure

rate-limiter/
├── src/
│   ├── algorithms/
│   │   ├── token_bucket.py
│   │   ├── sliding_window_log.py
│   │   └── sliding_window_counter.py
│   ├── backends/
│   │   └── redis_backend.py
│   ├── middleware/
│   │   └── fastapi.py
│   └── config.py
├── benchmarks/
│   └── benchmark.py
├── tests/
│   ├── test_token_bucket.py
│   ├── test_sliding_window_log.py
│   ├── test_sliding_window_counter.py
│   └── test_middleware.py
├── docker-compose.yml
└── pyproject.toml

Installation

Requirements: Python 3.11+, Redis 7+

git clone https://github.com/gabrigio30/rate-limiter
cd rate-limiter
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Start Redis via Docker:

docker compose up -d

Usage

As FastAPI Middleware

from fastapi import FastAPI
from src import RateLimiterMiddleware, RateLimiterConfig

app = FastAPI()

app.add_middleware(
    RateLimiterMiddleware,
    config=RateLimiterConfig(limit=100, window=60),
    algorithm="token_bucket",   # or "sliding_window_log" / "sliding_window_counter"
)

@app.get("/")
async def root():
    return {"message": "ok"}

Requests exceeding the limit receive a 429 Too Many Requests response:

{
  "error": "Too many requests",
  "detail": "Rate limit exceeded. Please try again later."
}

Directly

from src import RedisBackend, RateLimiterConfig, SlidingWindowCounter

backend = RedisBackend(host="localhost", port=6379)
config = RateLimiterConfig(limit=100, window=60)
limiter = SlidingWindowCounter(config, backend)

if limiter.is_allowed("user_123"):
    print("Request allowed")
else:
    print("Rate limit exceeded")

Configuration

Parameter Type Description
limit int Maximum requests allowed per window
window int Time window in seconds
capacity int Token bucket only: max burst size (defaults to limit)

Selecting an Algorithm

Use case Recommended algorithm
General purpose API rate limiting sliding_window_counter
APIs tolerating short bursts token_bucket
Strict per-window enforcement sliding_window_log

Benchmarks

Benchmarked on a MacBook Pro 2019, Redis 7 (Docker), Python 3.12. Each scenario run 5 times and averaged. Settings: limit=100, window=10 (seconds), 200 total requests, 10 concurrent threads.

Steady Traffic

Algorithm Throughput Allowed Rejected Error Mean latency P99 latency
Token bucket 2,415 req/s 100 100 0.0% 0.41ms 0.64ms
Sliding window log 1,987 req/s 100 100 0.0% 0.51ms 0.97ms
Sliding window counter 2,505 req/s 100 100 0.0% 0.40ms 0.56ms

Burst Traffic

Algorithm Throughput Allowed Rejected Error Mean latency P99 latency
Token bucket 2,483 req/s 100 100 0.0% 0.40ms 0.58ms
Sliding window log 2,422 req/s 100 100 0.0% 0.41ms 0.63ms
Sliding window counter 2,481 req/s 100 100 0.0% 0.40ms 0.72ms

Concurrent Traffic (10 threads)

Algorithm Throughput Allowed Rejected Error Mean latency P99 latency
Token bucket 5,072 req/s 100 100 0.0% 1.78ms 6.76ms
Sliding window log 5,569 req/s 100 100 0.0% 1.60ms 4.94ms
Sliding window counter 5,621 req/s 100 100 0.0% 1.60ms 5.25ms

Key Observations

All three algorithms achieve 0% accuracy error across all scenarios, including under 10 concurrent threads. This validates that the Redis-Lua atomicity guarantee eliminates race conditions entirely, no request was incorrectly allowed or rejected regardless of concurrency.

Sliding window log is ~20% slower under steady traffic due to the cost of sorted set operations (ZREMRANGEBYSCORE, ZCARD, ZADD) compared to hash fields (HMGET/HMSET) or simple counters (GET/INCR). This is the direct measurable cost of its exact accuracy guarantee.

All algorithms roughly double throughput under concurrent load. Python releases the GIL during Redis I/O, allowing threads to pipeline requests: while one thread waits for a Redis response, others proceed, effectively overlapping the network round trips.

Token bucket shows higher P99 latency under concurrency (6.76ms vs ~5ms for the others) due to its more computationally intensive Lua script, which amplifies under contention.


Design Decisions

Why Redis Lua scripts for atomicity

The check-and-update operation (read the current count, decide allow/reject, write the new count) must be atomic in a distributed environment. Without atomicity, two concurrent requests can both read the same counter value before either writes back, causing both to be allowed even when only one should be. Redis EVAL executes Lua scripts atomically: no other command runs on the server while the script executes, making the entire operation indivisible. This is preferable to distributed locks, which add latency and introduce their own failure modes.

Why the RedisBackend abstraction

Wrapping the Redis client behind a RedisBackend class applies the dependency inversion principle — algorithms depend on an abstraction rather than a concrete Redis client. This makes the algorithms independently testable (a mock backend can be injected in unit tests) and decouples the rate limiting logic from the specific Redis client library used.

Client identification via X-Forwarded-For

The middleware uses X-Forwarded-For headers when present rather than request.client.host. Behind a load balancer or reverse proxy, request.client.host is always the proxy's IP — all clients would share a single bucket and hit the limit almost immediately. X-Forwarded-For carries the original client IP set by the proxy, with the leftmost value being the true originating address in multi-hop scenarios.

Key namespacing

All Redis keys are namespaced by algorithm and identifier, for example rate_limit:token_bucket:user_123. This prevents key collisions when multiple algorithms are used against the same Redis instance, and follows the standard Redis key naming convention for production deployments.


Known Limitations

Sliding window counter approximation at boundaries. The weighted interpolation can theoretically allow slightly more than limit requests when a high-traffic previous window transitions into a new one. In the benchmarks this error was 0% because all requests fell within a single window, but real deployments spanning window boundaries may show a small deviation. This is an accepted trade-off for the memory efficiency gained.

No Redis Cluster support. The current implementation targets a single Redis node. Redis Cluster requires that all keys accessed in a single Lua script reside on the same node — the sliding window counter's two-key script would need hash tags ({user_123}) to guarantee co-location. This is a known extension point which could be covered with future work.

No EVALSHA optimisation. The implementation uses EVAL on every request, sending the full Lua script each time. Replacing this with EVALSHA (which sends only the script's SHA hash after the first call) would reduce network overhead under very high traffic.

FastAPI middleware initialisation is lazy. FastAPI defers middleware instantiation until the first request rather than at add_middleware call time. Configuration errors such as an invalid algorithm name are therefore only raised on the first request, not at startup. Instantiating the middleware class directly at application startup is the recommended workaround if eager validation is required.

Clock skew across application instances. All algorithms use time.time() on the application side to generate timestamps passed to Redis. In deployments where application servers have unsynchronised clocks, this can cause minor inaccuracies in window boundary calculations. NTP synchronisation across all nodes is assumed.


Author

Gabriele Giordanelli
M.Sc. Computer Science and Engineering — Politecnico di Milano
LinkedIn · GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atomic_rate_limiter-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atomic_rate_limiter-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file atomic_rate_limiter-0.1.0.tar.gz.

File metadata

  • Download URL: atomic_rate_limiter-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.4

File hashes

Hashes for atomic_rate_limiter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 858debd454c0b425fc6aacc940a0461e06b36a5e1822719165944cf259a98ef8
MD5 a8c895b2594047c30201bcc459e8076c
BLAKE2b-256 14beb08f2ee2bb65b89fc581e3238a43871667afba17cf52c97beeb52bfa2421

See more details on using hashes here.

File details

Details for the file atomic_rate_limiter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for atomic_rate_limiter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a3d0641a809b17381d184f53967c63228874d6a7849dbabddbaff533e66f409d
MD5 9140576298bebce49f96cf9b113eefe7
BLAKE2b-256 a8e6cf32ba4c70ba54387a60eef337de0c0438772efb71beca4196d412bf0b14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page