Multi-algorithm rate limiter with pluggable backends
Project description
โฑ๏ธ ratelimit
Multi-algorithm rate limiter with pluggable backends
Choose the right rate limiting algorithm for your use case -- 7 algorithms, one unified async API
Token Bucket + Sliding Window + Fixed Window + Leaky Bucket + GCRA + Concurrency Limiter
Why This Exists
Every API needs rate limiting, but no single algorithm fits all cases. Token Bucket allows bursts, Leaky Bucket smooths traffic, Sliding Window avoids boundary issues, GCRA powers Stripe and Shopify at scale, and Concurrency Limiter caps parallelism. Most libraries force you into one algorithm. If your needs change, you rewrite.
ratelimit gives you seven algorithms behind a single acquire/peek/reset interface. Swap algorithms without touching your application code. Add multi-tier limits with groups and chains. Get production-ready presets for common scenarios like login protection, API tiers, and webhook delivery. All async-first, all zero dependencies.
- 7 algorithms -- pick the right one for your use case, swap anytime without code changes
- Async-first -- native
async/awaitAPI designed for modern Python applications - Production presets -- one-line setup for login protection, API tiers, webhook delivery, and more
- Zero dependencies -- pure Python, no external packages required
Stop implementing rate limiting from scratch. Start choosing the right algorithm.
Features
| Category | Feature | Description |
|---|---|---|
| Algorithms | Token Bucket | Smooth rate limiting with configurable burst |
| Algorithms | Fixed Window | Simple time-window counters |
| Algorithms | Sliding Window Log | Exact request counting with per-request timestamps |
| Algorithms | Sliding Window Counter | Balanced accuracy/memory with weighted window overlap |
| Algorithms | Leaky Bucket | Constant-rate output for traffic smoothing |
| Algorithms | GCRA | Generic Cell Rate Algorithm (used by Stripe, Shopify) |
| Algorithms | Concurrency Limiter | Cap parallel connections/operations |
| API | Factory Function | create_limiter(100, 60) one-line setup |
| API | Decorator | @rate_limit(limiter) for function-level limiting |
| API | Context Manager | async with RateLimitContext(...) for scoped limiting |
| API | Wait Mode | wait_and_acquire() with automatic backpressure |
| API | HTTP Headers | result.to_headers() for standard rate limit headers |
| API | Callbacks | on_limited() and on_allowed() event hooks |
| Composition | Groups | Multi-tier limits (10/sec AND 1000/hour) -- all must allow |
| Composition | Chains | Sequential rate limit evaluation |
| Composition | Weighted Limiter | Different costs per endpoint with priority reserves |
| Protection | Circuit Breaker | Automatic failure protection (closed/open/half-open) |
| Protection | Penalty Tracker | Progressive backoff for repeat offenders |
| Analytics | Stats Collector | Per-key metrics (allowed, denied, latency) |
| Analytics | Rate Estimator | Real-time request rate estimation and prediction |
| Analytics | Quota Manager | Hourly/daily/weekly/monthly usage quota tracking |
| Utilities | Key Extractors | IP, user, API key, and endpoint pattern extraction |
| Utilities | Retry Strategies | Fixed, exponential backoff, retry-after header parsing |
| Utilities | Snapshots | State serialization for debugging and persistence |
| Utilities | Algorithm Info | Introspection and recommendation engine |
| Presets | 10 Presets | api_standard, api_strict, api_generous, login_protection, webhook_delivery, search_api, upload_limit, free_tier, pro_tier, enterprise_tier |
| Tooling | CLI Benchmark | Compare algorithm performance from the command line |
| Backends | Memory Backend | In-memory storage with TTL support |
๐ Quick Start
# 1. Install ratelimit
pip install -e .
# 2. Use in your application
python -c "
import asyncio
from ratelimit import create_limiter
async def main():
limiter = create_limiter(100, 60) # 100 requests per minute
result = await limiter.acquire('user:123')
print(f'Allowed: {result.allowed}, Remaining: {result.remaining}')
asyncio.run(main())
"
# 3. Or use presets
python -c "
import asyncio
from ratelimit import get_preset
async def main():
limiter = get_preset('login_protection') # 5 attempts / 15 min
result = await limiter.acquire('user:login')
print(f'Allowed: {result.allowed}')
asyncio.run(main())
"
๐ Algorithms
| Algorithm | Best For | Burst | Memory | Boundary Issues |
|---|---|---|---|---|
| Token Bucket | General API limiting | Yes (configurable) | O(1) | None |
| Fixed Window | Simple counters, dashboards | Boundary 2x | O(1) | Yes (2x at boundary) |
| Sliding Window Log | Exact counting, compliance | No | O(n) | None |
| Sliding Window Counter | Balanced accuracy/memory | Minimal | O(1) | Approximate |
| Leaky Bucket | Traffic smoothing, webhooks | Configurable | O(1) | None |
| GCRA | Production (Stripe/Shopify) | Yes | O(1) | None |
| Concurrency Limiter | Parallel connection caps | N/A | O(n) | N/A |
Token Bucket
Tokens refill at a constant rate. Each request consumes one token. If the bucket is empty, the request is denied. Supports burst by starting with a full bucket.
from ratelimit import create_limiter
limiter = create_limiter(100, 60, algorithm="token_bucket", burst_size=20)
Fixed Window
Simple counter per time window. Resets at window boundaries. Can allow 2x the limit at window boundaries.
limiter = create_limiter(100, 60, algorithm="fixed_window")
Sliding Window Log
Stores timestamp of every request. Most accurate but uses O(n) memory. Best for compliance and exact counting.
limiter = create_limiter(100, 60, algorithm="sliding_window_log")
Sliding Window Counter
Approximates sliding window using weighted overlap between current and previous fixed windows. O(1) memory with good accuracy.
limiter = create_limiter(100, 60, algorithm="sliding_window_counter")
Leaky Bucket
Requests enter a bucket that drains at a constant rate. Produces smooth, constant-rate output.
limiter = create_limiter(10, 1, algorithm="leaky_bucket")
GCRA (Generic Cell Rate Algorithm)
Used by Stripe and Shopify. Elegant single-value algorithm that tracks the next allowed request time. Best all-rounder for production.
limiter = create_limiter(100, 60, algorithm="gcra")
Concurrency Limiter
Caps the number of concurrent operations rather than request rate. Perfect for database connection pools or parallel API calls.
from ratelimit import ConcurrencyLimiter, MemoryBackend, RateLimiter, RateLimitConfig
config = RateLimitConfig(max_requests=10, window_seconds=1)
limiter = RateLimiter(ConcurrencyLimiter(MemoryBackend(), config))
๐ Usage Patterns
Decorator
from ratelimit import rate_limit, create_limiter
limiter = create_limiter(100, 60)
@rate_limit(limiter, key=lambda user_id: f"user:{user_id}")
async def get_data(user_id: str):
return await fetch_data(user_id)
# With wait mode (blocks instead of raising)
@rate_limit(limiter, wait=True, timeout=30.0)
async def get_data_wait(user_id: str):
return await fetch_data(user_id)
Context Manager
from ratelimit import RateLimitContext, ConcurrencyContext
# Rate limiting context
async with RateLimitContext(limiter, "user:123") as result:
if result.allowed:
process_request()
# Concurrency context (auto-release on exit)
async with ConcurrencyContext(concurrency_limiter, "user:123"):
await long_running_task()
Multi-Tier Rate Limiting
from ratelimit import create_limiter, RateLimitGroup
per_second = create_limiter(10, 1, key_prefix="sec")
per_minute = create_limiter(100, 60, key_prefix="min")
per_hour = create_limiter(1000, 3600, key_prefix="hour")
group = RateLimitGroup(per_second, per_minute, per_hour)
result = await group.acquire("user:123") # All three must allow
Presets
from ratelimit import get_preset, list_presets
# See all available presets
print(list_presets())
# Use presets
limiter = get_preset("api_standard") # 100 req/min, 20 burst
limiter = get_preset("api_strict") # 30 req/min, no burst
limiter = get_preset("api_generous") # 1000 req/min, 200 burst
limiter = get_preset("login_protection") # 5 attempts / 15 min
limiter = get_preset("webhook_delivery") # 10 req/sec, smoothed
limiter = get_preset("search_api") # 10/sec AND 60/min (dual)
limiter = get_preset("upload_limit") # 10 uploads / hour
limiter = get_preset("free_tier") # 100 req/hour
limiter = get_preset("pro_tier") # 5000 req/hour, 100 burst
limiter = get_preset("enterprise_tier") # 50000 req/hour, 500 burst
Circuit Breaker
from ratelimit import CircuitBreaker
breaker = CircuitBreaker(
failure_threshold=5, # Open after 5 failures
recovery_timeout=30.0, # Try again after 30 seconds
half_open_max_calls=3, # Allow 3 test calls in half-open
)
if breaker.allow_request():
try:
result = await external_api_call()
breaker.record_success()
except Exception:
breaker.record_failure()
Penalty Tracker
from ratelimit import PenaltyTracker
tracker = PenaltyTracker(
base_penalty=60.0, # 1 minute base penalty
multiplier=2.0, # Double each time
max_penalty=3600.0, # Cap at 1 hour
)
# Record violation
tracker.record_violation("abuser:ip")
# Check if penalized
penalty = tracker.get_penalty("abuser:ip")
if penalty > 0:
print(f"Penalized for {penalty:.0f} more seconds")
HTTP Headers
result = await limiter.acquire("user:123")
# Standard rate limit headers
headers = result.to_headers()
# {
# "X-RateLimit-Limit": "100",
# "X-RateLimit-Remaining": "99",
# "X-RateLimit-Reset": "1711468800",
# "Retry-After": "60" (only when denied)
# }
Statistics
from ratelimit import StatsCollector
stats = StatsCollector()
stats.record(key="user:123", allowed=True, latency_ms=1.2)
stats.record(key="user:123", allowed=False, latency_ms=0.8)
summary = stats.get_summary("user:123")
# {"total": 2, "allowed": 1, "denied": 1, "avg_latency_ms": 1.0}
Quota Manager
from ratelimit import QuotaManager
quota = QuotaManager()
quota.set_quota("user:123", hourly=1000, daily=10000, monthly=100000)
result = quota.check("user:123")
print(f"Hourly: {result.hourly_remaining}, Daily: {result.daily_remaining}")
Algorithm Recommendation
from ratelimit.info import recommend_algorithm, list_algorithms
# Get recommendation based on requirements
info = recommend_algorithm(needs_burst=True, memory_constrained=True)
# => GCRA - best all-rounder for production
# List all algorithms with descriptions
for algo in list_algorithms():
print(f"{algo.algorithm.value}: {algo.name} - {algo.best_for}")
๐๏ธ Architecture
ratelimit/
โโโ core.py # RateLimiter, RateLimitResult, RateLimitConfig, Backend ABC
โโโ algorithms/
โ โโโ token_bucket.py # Token Bucket algorithm
โ โโโ fixed_window.py # Fixed Window counter
โ โโโ sliding_window.py # Sliding Window (Log + Counter)
โ โโโ leaky_bucket.py # Leaky Bucket algorithm
โ โโโ gcra.py # Generic Cell Rate Algorithm
โ โโโ concurrency.py # Concurrency Limiter
โโโ backends/
โ โโโ memory.py # In-memory storage backend with TTL
โโโ factory.py # create_limiter() one-line factory
โโโ decorator.py # @rate_limit decorator (sync + async)
โโโ context.py # RateLimitContext, ConcurrencyContext
โโโ groups.py # RateLimitGroup, RateLimitChain
โโโ presets.py # 10 pre-configured policies
โโโ circuit.py # CircuitBreaker (closed/open/half-open)
โโโ penalty.py # PenaltyTracker with exponential backoff
โโโ stats.py # StatsCollector for per-key metrics
โโโ estimator.py # RateEstimator for traffic prediction
โโโ quota.py # QuotaManager (hourly/daily/weekly/monthly)
โโโ events.py # Event system with async-compatible emitter
โโโ keys.py # Key extraction (IP, user, API key, endpoint)
โโโ retry.py # Retry strategies (fixed, exponential, retry-after)
โโโ snapshot.py # State serialization and debugging
โโโ weighted.py # Weighted limiter with priority reserves
โโโ info.py # Algorithm introspection and recommender
โโโ cli.py # CLI benchmarking tool
โโโ middleware/ # Framework middleware (extensible)
Request Flow
Request
โ
โผ
โโโโโโโโโโโโโโโโ
โ Key Extract โ (IP, user, API key, endpoint)
โโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Penalty โโโโโถโ Circuit โ
โ Tracker โ โ Breaker โ
โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RateLimiter.acquire() โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Algorithm Engine โ โ
โ โ (Token Bucket, GCRA, โ โ
โ โ Sliding Window, etc) โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโผโโโโโโโโโโโโโ โ
โ โ Memory Backend โ โ
โ โ (get/set/increment) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โผ โผ
Allowed Denied
โ โ
โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Stats โ โ Retry-After โ
โ Record โ โ + Headers โ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
๐ก API Reference
Core
from ratelimit import (
# Algorithms
TokenBucket, FixedWindow, SlidingWindowLog,
SlidingWindowCounter, LeakyBucket, GCRA, ConcurrencyLimiter,
# Backend
MemoryBackend,
# Core types
RateLimiter, RateLimitResult, RateLimitConfig, Algorithm,
# Decorator
rate_limit, RateLimitExceeded,
# Composition
RateLimitGroup, RateLimitChain,
# Context managers
RateLimitContext, ConcurrencyContext,
# Utilities
StatsCollector, CircuitBreaker, PenaltyTracker,
RateEstimator, QuotaManager,
# Factory & Presets
create_limiter, get_preset, list_presets,
)
RateLimitResult
result = await limiter.acquire("key")
result.allowed # bool: was the request allowed?
result.remaining # int: remaining requests in window
result.limit # int: total limit
result.reset_at # float: Unix timestamp when limit resets
result.retry_after # float: seconds to wait before retrying
result.reset_in # float: seconds until reset (computed property)
result.to_headers() # dict: standard HTTP rate limit headers
RateLimiter Methods
# Try to acquire (non-blocking)
result = await limiter.acquire("key", cost=1)
# Check without consuming
result = await limiter.peek("key")
# Reset a key
await limiter.reset("key")
# Wait and acquire (blocking with timeout)
result = await limiter.wait_and_acquire("key", cost=1, timeout=30.0)
# Event callbacks
@limiter.on_limited
def handle_limited(key, result):
log.warning(f"Rate limited: {key}")
@limiter.on_allowed
def handle_allowed(key, result):
stats.record(key)
๐ง How It Works
- Key Extraction -- Each request is identified by a key (user ID, IP, API key, or custom)
- Algorithm Selection -- The configured algorithm determines how requests are counted/tracked
- Backend Query -- The algorithm queries the storage backend for current state
- Decision -- The algorithm decides allow/deny based on its specific logic
- State Update -- On allow, the backend state is updated (decrement tokens, add timestamp, etc.)
- Result -- A
RateLimitResultis returned with allowed/denied, remaining count, reset time, and retry-after - Headers -- Results can be converted to standard HTTP headers for API responses
๐ ๏ธ CLI Benchmarking
# Benchmark an algorithm
python -m ratelimit.cli bench -a token_bucket -r 100 -w 1 -n 1000
# Output:
# Algorithm: token_bucket
# Limit: 100 / 1.0s
# Total requests: 1000
# Allowed: 100
# Denied: 900
# Elapsed: 0.0123s
# Throughput: 81300.81 req/s
# List all algorithms
python -m ratelimit.cli list
โ Troubleshooting
Which Algorithm Should I Use?
| Use Case | Recommended | Why |
|---|---|---|
| General API | Token Bucket or GCRA | Both handle bursts well with O(1) memory |
| Login protection | Sliding Window Log | Exact counting prevents boundary attacks |
| Webhook delivery | Leaky Bucket | Smooth constant-rate output |
| Simple counters | Fixed Window | Simplest to understand and debug |
| Connection pooling | Concurrency Limiter | Caps parallelism, not rate |
| Production at scale | GCRA | Battle-tested at Stripe/Shopify |
Memory Concerns
- Token Bucket, Fixed Window, GCRA, Leaky Bucket: O(1) memory per key
- Sliding Window Log: O(n) where n is requests in the window -- use Counter variant for large volumes
- Concurrency Limiter: O(n) where n is concurrent operations
Async vs Sync
All APIs are async-first. For sync code, use the decorator which handles the event loop automatically:
@rate_limit(limiter)
def sync_function(): # Works with sync functions too
pass
๐งช Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run all 416 tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=ratelimit --cov-report=term-missing
# Run specific algorithm tests
pytest tests/test_algorithms/test_token_bucket.py -v
pytest tests/test_algorithms/test_gcra.py -v
# Run integration tests
pytest tests/test_integration/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsleekr_ratelimit-1.0.0.tar.gz.
File metadata
- Download URL: jsleekr_ratelimit-1.0.0.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbb1f73c75b7578b27f8713cac6864ab0acbc720141bdca2c41410b4854b1932
|
|
| MD5 |
5e5a7c00150337e3d1caf4208d791969
|
|
| BLAKE2b-256 |
a95d3501d49caf7b6528802301ebeb62a5fa730caec79d61df4a72c42eac18e7
|
File details
Details for the file jsleekr_ratelimit-1.0.0-py3-none-any.whl.
File metadata
- Download URL: jsleekr_ratelimit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2366a7b524c6dc904c9a70ff65c131d3e4db41e4eed8468076f966013b8fb3da
|
|
| MD5 |
2ac1713cbdc12beb36385abe839c3cc9
|
|
| BLAKE2b-256 |
1811665a215108e3f304def9f9f51902d07e58e1288fc3d57c6c3200f0f1cbc5
|