Skip to main content

Multi-algorithm rate limiter with pluggable backends

Project description

โฑ๏ธ ratelimit

Multi-algorithm rate limiter with pluggable backends

GitHub Stars License Python Tests


Choose the right rate limiting algorithm for your use case -- 7 algorithms, one unified async API

Token Bucket + Sliding Window + Fixed Window + Leaky Bucket + GCRA + Concurrency Limiter

Quick Start | Features | Algorithms | Architecture


Why This Exists

Every API needs rate limiting, but no single algorithm fits all cases. Token Bucket allows bursts, Leaky Bucket smooths traffic, Sliding Window avoids boundary issues, GCRA powers Stripe and Shopify at scale, and Concurrency Limiter caps parallelism. Most libraries force you into one algorithm. If your needs change, you rewrite.

ratelimit gives you seven algorithms behind a single acquire/peek/reset interface. Swap algorithms without touching your application code. Add multi-tier limits with groups and chains. Get production-ready presets for common scenarios like login protection, API tiers, and webhook delivery. All async-first, all zero dependencies.

  • 7 algorithms -- pick the right one for your use case, swap anytime without code changes
  • Async-first -- native async/await API designed for modern Python applications
  • Production presets -- one-line setup for login protection, API tiers, webhook delivery, and more
  • Zero dependencies -- pure Python, no external packages required

Stop implementing rate limiting from scratch. Start choosing the right algorithm.


Features

Category Feature Description
Algorithms Token Bucket Smooth rate limiting with configurable burst
Algorithms Fixed Window Simple time-window counters
Algorithms Sliding Window Log Exact request counting with per-request timestamps
Algorithms Sliding Window Counter Balanced accuracy/memory with weighted window overlap
Algorithms Leaky Bucket Constant-rate output for traffic smoothing
Algorithms GCRA Generic Cell Rate Algorithm (used by Stripe, Shopify)
Algorithms Concurrency Limiter Cap parallel connections/operations
API Factory Function create_limiter(100, 60) one-line setup
API Decorator @rate_limit(limiter) for function-level limiting
API Context Manager async with RateLimitContext(...) for scoped limiting
API Wait Mode wait_and_acquire() with automatic backpressure
API HTTP Headers result.to_headers() for standard rate limit headers
API Callbacks on_limited() and on_allowed() event hooks
Composition Groups Multi-tier limits (10/sec AND 1000/hour) -- all must allow
Composition Chains Sequential rate limit evaluation
Composition Weighted Limiter Different costs per endpoint with priority reserves
Protection Circuit Breaker Automatic failure protection (closed/open/half-open)
Protection Penalty Tracker Progressive backoff for repeat offenders
Analytics Stats Collector Per-key metrics (allowed, denied, latency)
Analytics Rate Estimator Real-time request rate estimation and prediction
Analytics Quota Manager Hourly/daily/weekly/monthly usage quota tracking
Utilities Key Extractors IP, user, API key, and endpoint pattern extraction
Utilities Retry Strategies Fixed, exponential backoff, retry-after header parsing
Utilities Snapshots State serialization for debugging and persistence
Utilities Algorithm Info Introspection and recommendation engine
Presets 10 Presets api_standard, api_strict, api_generous, login_protection, webhook_delivery, search_api, upload_limit, free_tier, pro_tier, enterprise_tier
Tooling CLI Benchmark Compare algorithm performance from the command line
Backends Memory Backend In-memory storage with TTL support

๐Ÿš€ Quick Start

# 1. Install ratelimit
pip install -e .

# 2. Use in your application
python -c "
import asyncio
from ratelimit import create_limiter

async def main():
    limiter = create_limiter(100, 60)  # 100 requests per minute
    result = await limiter.acquire('user:123')
    print(f'Allowed: {result.allowed}, Remaining: {result.remaining}')

asyncio.run(main())
"

# 3. Or use presets
python -c "
import asyncio
from ratelimit import get_preset

async def main():
    limiter = get_preset('login_protection')  # 5 attempts / 15 min
    result = await limiter.acquire('user:login')
    print(f'Allowed: {result.allowed}')

asyncio.run(main())
"

๐Ÿ“Š Algorithms

Algorithm Best For Burst Memory Boundary Issues
Token Bucket General API limiting Yes (configurable) O(1) None
Fixed Window Simple counters, dashboards Boundary 2x O(1) Yes (2x at boundary)
Sliding Window Log Exact counting, compliance No O(n) None
Sliding Window Counter Balanced accuracy/memory Minimal O(1) Approximate
Leaky Bucket Traffic smoothing, webhooks Configurable O(1) None
GCRA Production (Stripe/Shopify) Yes O(1) None
Concurrency Limiter Parallel connection caps N/A O(n) N/A

Token Bucket

Tokens refill at a constant rate. Each request consumes one token. If the bucket is empty, the request is denied. Supports burst by starting with a full bucket.

from ratelimit import create_limiter

limiter = create_limiter(100, 60, algorithm="token_bucket", burst_size=20)

Fixed Window

Simple counter per time window. Resets at window boundaries. Can allow 2x the limit at window boundaries.

limiter = create_limiter(100, 60, algorithm="fixed_window")

Sliding Window Log

Stores timestamp of every request. Most accurate but uses O(n) memory. Best for compliance and exact counting.

limiter = create_limiter(100, 60, algorithm="sliding_window_log")

Sliding Window Counter

Approximates sliding window using weighted overlap between current and previous fixed windows. O(1) memory with good accuracy.

limiter = create_limiter(100, 60, algorithm="sliding_window_counter")

Leaky Bucket

Requests enter a bucket that drains at a constant rate. Produces smooth, constant-rate output.

limiter = create_limiter(10, 1, algorithm="leaky_bucket")

GCRA (Generic Cell Rate Algorithm)

Used by Stripe and Shopify. Elegant single-value algorithm that tracks the next allowed request time. Best all-rounder for production.

limiter = create_limiter(100, 60, algorithm="gcra")

Concurrency Limiter

Caps the number of concurrent operations rather than request rate. Perfect for database connection pools or parallel API calls.

from ratelimit import ConcurrencyLimiter, MemoryBackend, RateLimiter, RateLimitConfig

config = RateLimitConfig(max_requests=10, window_seconds=1)
limiter = RateLimiter(ConcurrencyLimiter(MemoryBackend(), config))

๐Ÿ“‹ Usage Patterns

Decorator

from ratelimit import rate_limit, create_limiter

limiter = create_limiter(100, 60)

@rate_limit(limiter, key=lambda user_id: f"user:{user_id}")
async def get_data(user_id: str):
    return await fetch_data(user_id)

# With wait mode (blocks instead of raising)
@rate_limit(limiter, wait=True, timeout=30.0)
async def get_data_wait(user_id: str):
    return await fetch_data(user_id)

Context Manager

from ratelimit import RateLimitContext, ConcurrencyContext

# Rate limiting context
async with RateLimitContext(limiter, "user:123") as result:
    if result.allowed:
        process_request()

# Concurrency context (auto-release on exit)
async with ConcurrencyContext(concurrency_limiter, "user:123"):
    await long_running_task()

Multi-Tier Rate Limiting

from ratelimit import create_limiter, RateLimitGroup

per_second = create_limiter(10, 1, key_prefix="sec")
per_minute = create_limiter(100, 60, key_prefix="min")
per_hour = create_limiter(1000, 3600, key_prefix="hour")

group = RateLimitGroup(per_second, per_minute, per_hour)
result = await group.acquire("user:123")  # All three must allow

Presets

from ratelimit import get_preset, list_presets

# See all available presets
print(list_presets())

# Use presets
limiter = get_preset("api_standard")       # 100 req/min, 20 burst
limiter = get_preset("api_strict")         # 30 req/min, no burst
limiter = get_preset("api_generous")       # 1000 req/min, 200 burst
limiter = get_preset("login_protection")   # 5 attempts / 15 min
limiter = get_preset("webhook_delivery")   # 10 req/sec, smoothed
limiter = get_preset("search_api")         # 10/sec AND 60/min (dual)
limiter = get_preset("upload_limit")       # 10 uploads / hour
limiter = get_preset("free_tier")          # 100 req/hour
limiter = get_preset("pro_tier")           # 5000 req/hour, 100 burst
limiter = get_preset("enterprise_tier")    # 50000 req/hour, 500 burst

Circuit Breaker

from ratelimit import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,      # Open after 5 failures
    recovery_timeout=30.0,    # Try again after 30 seconds
    half_open_max_calls=3,    # Allow 3 test calls in half-open
)

if breaker.allow_request():
    try:
        result = await external_api_call()
        breaker.record_success()
    except Exception:
        breaker.record_failure()

Penalty Tracker

from ratelimit import PenaltyTracker

tracker = PenaltyTracker(
    base_penalty=60.0,        # 1 minute base penalty
    multiplier=2.0,           # Double each time
    max_penalty=3600.0,       # Cap at 1 hour
)

# Record violation
tracker.record_violation("abuser:ip")

# Check if penalized
penalty = tracker.get_penalty("abuser:ip")
if penalty > 0:
    print(f"Penalized for {penalty:.0f} more seconds")

HTTP Headers

result = await limiter.acquire("user:123")

# Standard rate limit headers
headers = result.to_headers()
# {
#   "X-RateLimit-Limit": "100",
#   "X-RateLimit-Remaining": "99",
#   "X-RateLimit-Reset": "1711468800",
#   "Retry-After": "60"  (only when denied)
# }

Statistics

from ratelimit import StatsCollector

stats = StatsCollector()
stats.record(key="user:123", allowed=True, latency_ms=1.2)
stats.record(key="user:123", allowed=False, latency_ms=0.8)

summary = stats.get_summary("user:123")
# {"total": 2, "allowed": 1, "denied": 1, "avg_latency_ms": 1.0}

Quota Manager

from ratelimit import QuotaManager

quota = QuotaManager()
quota.set_quota("user:123", hourly=1000, daily=10000, monthly=100000)

result = quota.check("user:123")
print(f"Hourly: {result.hourly_remaining}, Daily: {result.daily_remaining}")

Algorithm Recommendation

from ratelimit.info import recommend_algorithm, list_algorithms

# Get recommendation based on requirements
info = recommend_algorithm(needs_burst=True, memory_constrained=True)
# => GCRA - best all-rounder for production

# List all algorithms with descriptions
for algo in list_algorithms():
    print(f"{algo.algorithm.value}: {algo.name} - {algo.best_for}")

๐Ÿ—๏ธ Architecture

ratelimit/
โ”œโ”€โ”€ core.py              # RateLimiter, RateLimitResult, RateLimitConfig, Backend ABC
โ”œโ”€โ”€ algorithms/
โ”‚   โ”œโ”€โ”€ token_bucket.py  # Token Bucket algorithm
โ”‚   โ”œโ”€โ”€ fixed_window.py  # Fixed Window counter
โ”‚   โ”œโ”€โ”€ sliding_window.py # Sliding Window (Log + Counter)
โ”‚   โ”œโ”€โ”€ leaky_bucket.py  # Leaky Bucket algorithm
โ”‚   โ”œโ”€โ”€ gcra.py          # Generic Cell Rate Algorithm
โ”‚   โ””โ”€โ”€ concurrency.py   # Concurrency Limiter
โ”œโ”€โ”€ backends/
โ”‚   โ””โ”€โ”€ memory.py        # In-memory storage backend with TTL
โ”œโ”€โ”€ factory.py           # create_limiter() one-line factory
โ”œโ”€โ”€ decorator.py         # @rate_limit decorator (sync + async)
โ”œโ”€โ”€ context.py           # RateLimitContext, ConcurrencyContext
โ”œโ”€โ”€ groups.py            # RateLimitGroup, RateLimitChain
โ”œโ”€โ”€ presets.py           # 10 pre-configured policies
โ”œโ”€โ”€ circuit.py           # CircuitBreaker (closed/open/half-open)
โ”œโ”€โ”€ penalty.py           # PenaltyTracker with exponential backoff
โ”œโ”€โ”€ stats.py             # StatsCollector for per-key metrics
โ”œโ”€โ”€ estimator.py         # RateEstimator for traffic prediction
โ”œโ”€โ”€ quota.py             # QuotaManager (hourly/daily/weekly/monthly)
โ”œโ”€โ”€ events.py            # Event system with async-compatible emitter
โ”œโ”€โ”€ keys.py              # Key extraction (IP, user, API key, endpoint)
โ”œโ”€โ”€ retry.py             # Retry strategies (fixed, exponential, retry-after)
โ”œโ”€โ”€ snapshot.py          # State serialization and debugging
โ”œโ”€โ”€ weighted.py          # Weighted limiter with priority reserves
โ”œโ”€โ”€ info.py              # Algorithm introspection and recommender
โ”œโ”€โ”€ cli.py               # CLI benchmarking tool
โ””โ”€โ”€ middleware/           # Framework middleware (extensible)

Request Flow

    Request
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Key Extract โ”‚  (IP, user, API key, endpoint)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Penalty    โ”‚โ”€โ”€โ”€โ–ถโ”‚   Circuit    โ”‚
โ”‚   Tracker    โ”‚    โ”‚   Breaker    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                   โ”‚
       โ–ผ                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     RateLimiter.acquire()    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   Algorithm Engine     โ”‚  โ”‚
โ”‚  โ”‚  (Token Bucket, GCRA,  โ”‚  โ”‚
โ”‚  โ”‚   Sliding Window, etc) โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚              โ”‚               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   Memory Backend       โ”‚  โ”‚
โ”‚  โ”‚   (get/set/increment)  โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ–ผ                 โ–ผ
  Allowed            Denied
      โ”‚                 โ”‚
      โ–ผ                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stats   โ”‚    โ”‚  Retry-After โ”‚
โ”‚ Record   โ”‚    โ”‚  + Headers   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ก API Reference

Core

from ratelimit import (
    # Algorithms
    TokenBucket, FixedWindow, SlidingWindowLog,
    SlidingWindowCounter, LeakyBucket, GCRA, ConcurrencyLimiter,
    # Backend
    MemoryBackend,
    # Core types
    RateLimiter, RateLimitResult, RateLimitConfig, Algorithm,
    # Decorator
    rate_limit, RateLimitExceeded,
    # Composition
    RateLimitGroup, RateLimitChain,
    # Context managers
    RateLimitContext, ConcurrencyContext,
    # Utilities
    StatsCollector, CircuitBreaker, PenaltyTracker,
    RateEstimator, QuotaManager,
    # Factory & Presets
    create_limiter, get_preset, list_presets,
)

RateLimitResult

result = await limiter.acquire("key")

result.allowed        # bool: was the request allowed?
result.remaining      # int: remaining requests in window
result.limit          # int: total limit
result.reset_at       # float: Unix timestamp when limit resets
result.retry_after    # float: seconds to wait before retrying
result.reset_in       # float: seconds until reset (computed property)
result.to_headers()   # dict: standard HTTP rate limit headers

RateLimiter Methods

# Try to acquire (non-blocking)
result = await limiter.acquire("key", cost=1)

# Check without consuming
result = await limiter.peek("key")

# Reset a key
await limiter.reset("key")

# Wait and acquire (blocking with timeout)
result = await limiter.wait_and_acquire("key", cost=1, timeout=30.0)

# Event callbacks
@limiter.on_limited
def handle_limited(key, result):
    log.warning(f"Rate limited: {key}")

@limiter.on_allowed
def handle_allowed(key, result):
    stats.record(key)

๐Ÿ”ง How It Works

  1. Key Extraction -- Each request is identified by a key (user ID, IP, API key, or custom)
  2. Algorithm Selection -- The configured algorithm determines how requests are counted/tracked
  3. Backend Query -- The algorithm queries the storage backend for current state
  4. Decision -- The algorithm decides allow/deny based on its specific logic
  5. State Update -- On allow, the backend state is updated (decrement tokens, add timestamp, etc.)
  6. Result -- A RateLimitResult is returned with allowed/denied, remaining count, reset time, and retry-after
  7. Headers -- Results can be converted to standard HTTP headers for API responses

๐Ÿ› ๏ธ CLI Benchmarking

# Benchmark an algorithm
python -m ratelimit.cli bench -a token_bucket -r 100 -w 1 -n 1000

# Output:
# Algorithm:         token_bucket
# Limit:             100 / 1.0s
# Total requests:    1000
# Allowed:           100
# Denied:            900
# Elapsed:           0.0123s
# Throughput:        81300.81 req/s

# List all algorithms
python -m ratelimit.cli list

โ“ Troubleshooting

Which Algorithm Should I Use?

Use Case Recommended Why
General API Token Bucket or GCRA Both handle bursts well with O(1) memory
Login protection Sliding Window Log Exact counting prevents boundary attacks
Webhook delivery Leaky Bucket Smooth constant-rate output
Simple counters Fixed Window Simplest to understand and debug
Connection pooling Concurrency Limiter Caps parallelism, not rate
Production at scale GCRA Battle-tested at Stripe/Shopify

Memory Concerns

  • Token Bucket, Fixed Window, GCRA, Leaky Bucket: O(1) memory per key
  • Sliding Window Log: O(n) where n is requests in the window -- use Counter variant for large volumes
  • Concurrency Limiter: O(n) where n is concurrent operations

Async vs Sync

All APIs are async-first. For sync code, use the decorator which handles the event loop automatically:

@rate_limit(limiter)
def sync_function():  # Works with sync functions too
    pass

๐Ÿงช Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run all 416 tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=ratelimit --cov-report=term-missing

# Run specific algorithm tests
pytest tests/test_algorithms/test_token_bucket.py -v
pytest tests/test_algorithms/test_gcra.py -v

# Run integration tests
pytest tests/test_integration/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsleekr_ratelimit-1.0.0.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsleekr_ratelimit-1.0.0-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file jsleekr_ratelimit-1.0.0.tar.gz.

File metadata

  • Download URL: jsleekr_ratelimit-1.0.0.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jsleekr_ratelimit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bbb1f73c75b7578b27f8713cac6864ab0acbc720141bdca2c41410b4854b1932
MD5 5e5a7c00150337e3d1caf4208d791969
BLAKE2b-256 a95d3501d49caf7b6528802301ebeb62a5fa730caec79d61df4a72c42eac18e7

See more details on using hashes here.

File details

Details for the file jsleekr_ratelimit-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jsleekr_ratelimit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2366a7b524c6dc904c9a70ff65c131d3e4db41e4eed8468076f966013b8fb3da
MD5 2ac1713cbdc12beb36385abe839c3cc9
BLAKE2b-256 1811665a215108e3f304def9f9f51902d07e58e1288fc3d57c6c3200f0f1cbc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page