Skip to main content

Async-native rate limiting for Python — token bucket, fixed window, and sliding window algorithms with in-memory and Redis backends

Project description

rate-limiter

CI Python 3.11+ License: MIT

A pluggable, async-native rate-limiting library for Python. Supports token bucket, fixed window, and sliding window log algorithms across in-memory and Redis backends, with a drop-in ASGI middleware for FastAPI/Starlette.

Features

  • 3 algorithms — fixed window, sliding window log, token bucket — selectable at construction time
  • 2 backends — in-memory (single-process, ~70k ops/s) and Redis (distributed, ~27k ops/s via atomic Lua scripts)
  • ASGI middleware — drop into any FastAPI/Starlette app with X-RateLimit-* headers, 429 responses, and Retry-After
  • Async-native — built on asyncio and redis-py async; no blocking calls
  • Bring-your-own Redis — the caller owns the redis.asyncio.Redis instance and its lifecycle
  • Configurable failure policy — fail-open (favour availability) or fail-closed (favour safety)
  • Fully typed — Pydantic models, mypy-clean

Algorithm trade-offs

Algorithm Accuracy Redis memory Redis ops Best for
Fixed window Can allow 2x burst at window boundaries O(1) — 64 B/key INCR + EXPIRE Simple rate caps where boundary bursts are acceptable
Sliding window log Exact — no boundary bursts O(N) — grows with max_requests ZADD + ZREMRANGEBYSCORE + ZCARD Strict per-client fairness
Token bucket Smooth — allows controlled bursts up to bucket capacity O(1) — 141 B/key HGET + HSET + TIME APIs that want to allow short bursts while enforcing an average rate

Quick start

Installation

pip install async-ratelimit

In-memory rate limiting

import asyncio
from rate_limiter import RateLimiterOrchestrator, AlgorithmType, RateLimiterType

limiter = RateLimiterOrchestrator(
    rate_limiter_type=RateLimiterType.IN_MEMORY,
    algorithm_type=AlgorithmType.TOKEN_BUCKET,
    max_requests=100,       # 100 requests
    time_window=60,         # per 60-second window
)

async def main():
    response = await limiter.get_response(uId="user-123")
    print(response.allowed)             # True
    print(response.remaining_requests)  # 99
    print(response.reset_time)          # seconds until the bucket refills

asyncio.run(main())

Distributed rate limiting with Redis

from redis.asyncio import Redis

redis_client = Redis(host="localhost", port=6379, decode_responses=True)

limiter = RateLimiterOrchestrator(
    rate_limiter_type=RateLimiterType.REDIS,
    algorithm_type=AlgorithmType.SLIDING_WINDOW,
    max_requests=1000,
    time_window=3600,
    redis_client=redis_client,      # bring your own client
)

# Use it the same way — Redis backend is transparent
response = await limiter.get_response(uId="user-123")

# Clean up when done
await redis_client.aclose()

FastAPI middleware

from fastapi import FastAPI
from redis.asyncio import Redis
from rate_limiter import RateLimiterOrchestrator, RateLimiterMiddleware, AlgorithmType, RateLimiterType

app = FastAPI()

redis_client = Redis(host="localhost", port=6379, decode_responses=True)
limiter = RateLimiterOrchestrator(
    rate_limiter_type=RateLimiterType.REDIS,
    algorithm_type=AlgorithmType.TOKEN_BUCKET,
    max_requests=10,
    time_window=60,
    redis_client=redis_client,
)

app.add_middleware(
    RateLimiterMiddleware,
    limiter=limiter,
    key_func=lambda r: r.headers.get("X-API-Key") or r.client.host,
    exclude_routes=["/health", "/docs"],
    fail_open=True,     # let requests through if the limiter errors
)

@app.get("/ping")
async def ping():
    return {"message": "pong"}

Every response includes standard rate-limit headers:

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 45

When the limit is exceeded, the middleware returns 429 Too Many Requests with a Retry-After header — the downstream handler is never invoked.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                     RateLimiterOrchestrator                      │
│  (public API — validates config, delegates to backend)           │
├──────────────┬───────────────────────────────────────────────────┤
│              │                                                   │
│  ┌───────────▼──────────┐       ┌───────────────────────────┐   │
│  │  InMemoryRateLimiter │       │    RedisRateLimiter       │   │
│  │  (asyncio.Lock)      │       │    (BYO redis client)     │   │
│  └───────────┬──────────┘       └───────────┬───────────────┘   │
│              │                              │                    │
│              ▼                              ▼                    │
│     AlgorithmFactory.create()       AlgorithmFactory.create()    │
│              │                              │                    │
│   ┌──────────┼──────────┐       ┌───────────┼──────────┐        │
│   │          │          │       │           │          │         │
│   ▼          ▼          ▼       ▼           ▼          ▼         │
│ FixedW   SlidingW   TokenB   FixedW    SlidingW    TokenB       │
│ InMem    InMem      InMem    InRedis   InRedis     InRedis      │
│ (dict)   (dict)     (dict)   (Lua)     (Lua)       (Lua)        │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│                    RateLimiterMiddleware                          │
│  (ASGI — key_func, exclude_routes, fail_open/closed)             │
│  Wraps RateLimiterOrchestrator, adds X-RateLimit-* headers       │
└──────────────────────────────────────────────────────────────────┘

Design decisions:

  • Strategy + Factory pattern — algorithms are interchangeable behind a common Algorithm ABC. The factory dispatches on (backend, algorithm) to the correct implementation.
  • Bring-your-own Redis — the library never creates or configures a Redis connection. The caller passes in a redis.asyncio.Redis instance and owns its lifecycle (aclose()). This avoids env-var coupling and lets the caller control connection pooling, TLS, sentinels, etc.
  • Atomic Lua scripts — all Redis algorithms use EVAL/EVALSHA to run their logic server-side in a single atomic operation. No distributed locks needed.

Redis Lua scripts

Each Redis algorithm runs as a single Lua script executed atomically by Redis:

Algorithm Script Operations Atomicity guarantee
Fixed window INCR → conditional EXPIRETTL Increment counter, set TTL on first request Counter + expiry can never diverge
Sliding window ZREMRANGEBYSCOREZCARDZADDEXPIRE Prune expired entries, count, add if under limit No request can slip between prune and count
Token bucket TIMEHGET → refill math → HSETEXPIRE Server-authoritative clock, lazy token refill Refill + consume is one atomic step

Benchmarks

Microbenchmark results on Apple Silicon (M-series), Python 3.14.2, Redis 7 on localhost. Full methodology and commands in BENCHMARKS.md.

Throughput (ops/s) — hot key, 1 client

Backend Algorithm Conc=1 Conc=10 Conc=50 Conc=100 Conc=250
in_memory fixed_window 68,404 69,823 67,906 66,705 59,387
in_memory sliding_window 57,341 59,153 60,474 60,604 58,528
in_memory token_bucket 70,934 69,237 71,112 69,173 69,721
redis fixed_window 8,533 25,317 29,456 27,889 27,315
redis sliding_window 7,631 21,407 24,718 25,011 23,390
redis token_bucket 7,721 23,101 26,864 27,676 25,393

Latency at peak load (concurrency=250)

Backend Algorithm p50 p99 p99.9
in_memory fixed_window 7.2 µs 24.2 µs 85.5 µs
in_memory sliding_window 8.8 µs 9.4 µs 14.6 µs
in_memory token_bucket 6.7 µs 7.1 µs 11.5 µs
redis fixed_window 6.7 ms 25.0 ms 58.6 ms
redis sliding_window 7.9 ms 27.7 ms 50.5 ms
redis token_bucket 7.2 ms 28.1 ms 53.7 ms

Redis memory per client key

Algorithm Memory Data structure
Fixed window 64 B String (counter) — O(1)
Token bucket 141 B Hash (3 fields) — O(1)
Sliding window 6.4 MB Sorted set (1 member/request) — O(N)

Key observations

  • In-memory throughput is ~8x higher than Redis (~70k vs ~8k ops/s at concurrency=1) because Redis operations pay a network round-trip (~100 µs) even on localhost.
  • Redis throughput scales 3x with concurrency (8k → 27k ops/s from c=1 → c=50) because asyncio coroutines overlap I/O waits. It plateaus as Redis's single-threaded command processing saturates.
  • In-memory throughput is flat across concurrency because asyncio is single-threaded — all coroutines serialize through asyncio.Lock.
  • Sliding window is the most memory-expensive at O(N) per client. With 50k max_requests, a single key uses 6.4 MB. Fixed window and token bucket use O(1) space regardless of request volume.

For the full benchmark suite, methodology, multi-client results, and chart generation, see BENCHMARKS.md.

Testing

# Run all tests (Redis required — via local instance, testcontainers, or REDIS_HOST env)
pytest -v

# In-memory tests only (no Redis needed)
pytest -v tests/test_fixed_window.py tests/test_sliding_window.py tests/test_token_bucket.py

# With Docker Compose (spins up Redis automatically)
docker compose up --build

The test suite includes:

  • Behavioral tests — allow/deny, window expiry, token refill, weight decay, reset
  • Redis integration tests — same behavioral coverage against a live Redis with Lua scripts
  • Concurrency tests — 50-way asyncio.gather across all algorithms
  • Middleware tests — 429 responses, X-RateLimit-* headers, Retry-After, fail-open/closed
  • Edge-case tests — zero/negative config validation

55 tests total, all passing in CI with a Redis service container.

Development

# Clone and install dev dependencies
git clone https://github.com/freq31/rate_limiter.git
cd rate_limiter
pip install -r requirements.txt

# Lint and type-check
ruff check .
black --check rate_limiter tests
mypy rate_limiter

# Run benchmarks
python -m benchmarks.run --ops 50000 --concurrency 1,10,50,100,250

# Generate charts
python -m benchmarks.plot benchmarks/results/<csv_file>.csv

Project structure

rate_limiter/
├── __init__.py                 # Public API re-exports
├── main.py                     # RateLimiterOrchestrator (public API) + Factory
├── app.py                      # FastAPI demo application
├── settings.py                 # Pydantic settings for the demo app
├── algorithms/
│   ├── base.py                 # Algorithm ABC + AlgorithmFactory
│   ├── fixed_window.py         # FixedWindowInMemory, FixedWindowInRedis
│   ├── sliding_window.py       # SlidingWindowInMemory, SlidingWindowInRedis
│   └── token_bucket.py         # TokenBucketInMemory, TokenBucketInRedis
├── backend/
│   ├── base.py                 # RateLimiter ABC
│   ├── memory.py               # InMemoryRateLimiter
│   ├── redis.py                # RedisRateLimiter (BYO client)
│   ├── middleware.py           # RateLimiterMiddleware (ASGI)
│   ├── request.py              # Rules, Client, AlgorithmType, RateLimiterType
│   └── response.py             # Response model
└── scripts/
    ├── fixed_window.py         # Lua: INCR + EXPIRE
    ├── sliding_window.py       # Lua: sorted set log
    └── token_bucket.py         # Lua: hash + TIME + refill math
tests/                          # 55 tests — behavioral, integration, concurrency, middleware
benchmarks/                     # Async microbenchmark harness + chart generation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

async_ratelimit-0.1.1.tar.gz (459.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

async_ratelimit-0.1.1-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file async_ratelimit-0.1.1.tar.gz.

File metadata

  • Download URL: async_ratelimit-0.1.1.tar.gz
  • Upload date:
  • Size: 459.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for async_ratelimit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bafb20d81b1bac7bb110ad627700278efc7ae65d449246311c0687d4be8450ef
MD5 15cf2d65e88db3a667f36e9d32178d6f
BLAKE2b-256 7f857e6109e096fff2dc65dd8da0d7c7eea687b811f756586e20194c24cfbe8d

See more details on using hashes here.

File details

Details for the file async_ratelimit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for async_ratelimit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 77dbf560f7221e5422912e4251c1820dc42d39e9e4488731accc0a259fdbbb06
MD5 c4f388596d6d8945036d13ac6d67a470
BLAKE2b-256 9254add874cf545c3f08af0ddf59bdf4c0a9651383f7b233b82d09a0e4d770dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page