Multi-algorithm rate limiter with pluggable backends

These details have not been verified by PyPI

Project description

⏱️ ratelimit

Multi-algorithm rate limiter with pluggable backends

Choose the right rate limiting algorithm for your use case -- 7 algorithms, one unified async API

Token Bucket + Sliding Window + Fixed Window + Leaky Bucket + GCRA + Concurrency Limiter

Quick Start | Features | Algorithms | Architecture

Why This Exists

Every API needs rate limiting, but no single algorithm fits all cases. Token Bucket allows bursts, Leaky Bucket smooths traffic, Sliding Window avoids boundary issues, GCRA powers Stripe and Shopify at scale, and Concurrency Limiter caps parallelism. Most libraries force you into one algorithm. If your needs change, you rewrite.

ratelimit gives you seven algorithms behind a single acquire/peek/reset interface. Swap algorithms without touching your application code. Add multi-tier limits with groups and chains. Get production-ready presets for common scenarios like login protection, API tiers, and webhook delivery. All async-first, all zero dependencies.

7 algorithms -- pick the right one for your use case, swap anytime without code changes
Async-first -- native async/await API designed for modern Python applications
Production presets -- one-line setup for login protection, API tiers, webhook delivery, and more
Zero dependencies -- pure Python, no external packages required

Stop implementing rate limiting from scratch. Start choosing the right algorithm.

Features

Category	Feature	Description
Algorithms	Token Bucket	Smooth rate limiting with configurable burst
Algorithms	Fixed Window	Simple time-window counters
Algorithms	Sliding Window Log	Exact request counting with per-request timestamps
Algorithms	Sliding Window Counter	Balanced accuracy/memory with weighted window overlap
Algorithms	Leaky Bucket	Constant-rate output for traffic smoothing
Algorithms	GCRA	Generic Cell Rate Algorithm (used by Stripe, Shopify)
Algorithms	Concurrency Limiter	Cap parallel connections/operations
API	Factory Function	`create_limiter(100, 60)` one-line setup
API	Decorator	`@rate_limit(limiter)` for function-level limiting
API	Context Manager	`async with RateLimitContext(...)` for scoped limiting
API	Wait Mode	`wait_and_acquire()` with automatic backpressure
API	HTTP Headers	`result.to_headers()` for standard rate limit headers
API	Callbacks	`on_limited()` and `on_allowed()` event hooks
Composition	Groups	Multi-tier limits (10/sec AND 1000/hour) -- all must allow
Composition	Chains	Sequential rate limit evaluation
Composition	Weighted Limiter	Different costs per endpoint with priority reserves
Protection	Circuit Breaker	Automatic failure protection (closed/open/half-open)
Protection	Penalty Tracker	Progressive backoff for repeat offenders
Analytics	Stats Collector	Per-key metrics (allowed, denied, latency)
Analytics	Rate Estimator	Real-time request rate estimation and prediction
Analytics	Quota Manager	Hourly/daily/weekly/monthly usage quota tracking
Utilities	Key Extractors	IP, user, API key, and endpoint pattern extraction
Utilities	Retry Strategies	Fixed, exponential backoff, retry-after header parsing
Utilities	Snapshots	State serialization for debugging and persistence
Utilities	Algorithm Info	Introspection and recommendation engine
Presets	10 Presets	api_standard, api_strict, api_generous, login_protection, webhook_delivery, search_api, upload_limit, free_tier, pro_tier, enterprise_tier
Tooling	CLI Benchmark	Compare algorithm performance from the command line
Backends	Memory Backend	In-memory storage with TTL support

🚀 Quick Start

# 1. Install ratelimit
pip install -e .

# 2. Use in your application
python -c "
import asyncio
from ratelimit import create_limiter

async def main():
    limiter = create_limiter(100, 60)  # 100 requests per minute
    result = await limiter.acquire('user:123')
    print(f'Allowed: {result.allowed}, Remaining: {result.remaining}')

asyncio.run(main())
"

# 3. Or use presets
python -c "
import asyncio
from ratelimit import get_preset

async def main():
    limiter = get_preset('login_protection')  # 5 attempts / 15 min
    result = await limiter.acquire('user:login')
    print(f'Allowed: {result.allowed}')

asyncio.run(main())
"

📊 Algorithms

Algorithm	Best For	Burst	Memory	Boundary Issues
Token Bucket	General API limiting	Yes (configurable)	O(1)	None
Fixed Window	Simple counters, dashboards	Boundary 2x	O(1)	Yes (2x at boundary)
Sliding Window Log	Exact counting, compliance	No	O(n)	None
Sliding Window Counter	Balanced accuracy/memory	Minimal	O(1)	Approximate
Leaky Bucket	Traffic smoothing, webhooks	Configurable	O(1)	None
GCRA	Production (Stripe/Shopify)	Yes	O(1)	None
Concurrency Limiter	Parallel connection caps	N/A	O(n)	N/A

Token Bucket

Tokens refill at a constant rate. Each request consumes one token. If the bucket is empty, the request is denied. Supports burst by starting with a full bucket.

from ratelimit import create_limiter

limiter = create_limiter(100, 60, algorithm="token_bucket", burst_size=20)

Fixed Window

Simple counter per time window. Resets at window boundaries. Can allow 2x the limit at window boundaries.

limiter = create_limiter(100, 60, algorithm="fixed_window")

Sliding Window Log

Stores timestamp of every request. Most accurate but uses O(n) memory. Best for compliance and exact counting.

limiter = create_limiter(100, 60, algorithm="sliding_window_log")

Sliding Window Counter

Approximates sliding window using weighted overlap between current and previous fixed windows. O(1) memory with good accuracy.

limiter = create_limiter(100, 60, algorithm="sliding_window_counter")

Leaky Bucket

Requests enter a bucket that drains at a constant rate. Produces smooth, constant-rate output.

limiter = create_limiter(10, 1, algorithm="leaky_bucket")

GCRA (Generic Cell Rate Algorithm)

Used by Stripe and Shopify. Elegant single-value algorithm that tracks the next allowed request time. Best all-rounder for production.

limiter = create_limiter(100, 60, algorithm="gcra")

Concurrency Limiter

Caps the number of concurrent operations rather than request rate. Perfect for database connection pools or parallel API calls.

from ratelimit import ConcurrencyLimiter, MemoryBackend, RateLimiter, RateLimitConfig

config = RateLimitConfig(max_requests=10, window_seconds=1)
limiter = RateLimiter(ConcurrencyLimiter(MemoryBackend(), config))

📋 Usage Patterns

Decorator

from ratelimit import rate_limit, create_limiter

limiter = create_limiter(100, 60)

@rate_limit(limiter, key=lambda user_id: f"user:{user_id}")
async def get_data(user_id: str):
    return await fetch_data(user_id)

# With wait mode (blocks instead of raising)
@rate_limit(limiter, wait=True, timeout=30.0)
async def get_data_wait(user_id: str):
    return await fetch_data(user_id)

Context Manager

from ratelimit import RateLimitContext, ConcurrencyContext

# Rate limiting context
async with RateLimitContext(limiter, "user:123") as result:
    if result.allowed:
        process_request()

# Concurrency context (auto-release on exit)
async with ConcurrencyContext(concurrency_limiter, "user:123"):
    await long_running_task()

Multi-Tier Rate Limiting

from ratelimit import create_limiter, RateLimitGroup

per_second = create_limiter(10, 1, key_prefix="sec")
per_minute = create_limiter(100, 60, key_prefix="min")
per_hour = create_limiter(1000, 3600, key_prefix="hour")

group = RateLimitGroup(per_second, per_minute, per_hour)
result = await group.acquire("user:123")  # All three must allow

Presets

from ratelimit import get_preset, list_presets

# See all available presets
print(list_presets())

# Use presets
limiter = get_preset("api_standard")       # 100 req/min, 20 burst
limiter = get_preset("api_strict")         # 30 req/min, no burst
limiter = get_preset("api_generous")       # 1000 req/min, 200 burst
limiter = get_preset("login_protection")   # 5 attempts / 15 min
limiter = get_preset("webhook_delivery")   # 10 req/sec, smoothed
limiter = get_preset("search_api")         # 10/sec AND 60/min (dual)
limiter = get_preset("upload_limit")       # 10 uploads / hour
limiter = get_preset("free_tier")          # 100 req/hour
limiter = get_preset("pro_tier")           # 5000 req/hour, 100 burst
limiter = get_preset("enterprise_tier")    # 50000 req/hour, 500 burst

Circuit Breaker

from ratelimit import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,      # Open after 5 failures
    recovery_timeout=30.0,    # Try again after 30 seconds
    half_open_max_calls=3,    # Allow 3 test calls in half-open
)

if breaker.allow_request():
    try:
        result = await external_api_call()
        breaker.record_success()
    except Exception:
        breaker.record_failure()

Penalty Tracker

from ratelimit import PenaltyTracker

tracker = PenaltyTracker(
    base_penalty=60.0,        # 1 minute base penalty
    multiplier=2.0,           # Double each time
    max_penalty=3600.0,       # Cap at 1 hour
)

# Record violation
tracker.record_violation("abuser:ip")

# Check if penalized
penalty = tracker.get_penalty("abuser:ip")
if penalty > 0:
    print(f"Penalized for {penalty:.0f} more seconds")

HTTP Headers

result = await limiter.acquire("user:123")

# Standard rate limit headers
headers = result.to_headers()
# {
#   "X-RateLimit-Limit": "100",
#   "X-RateLimit-Remaining": "99",
#   "X-RateLimit-Reset": "1711468800",
#   "Retry-After": "60"  (only when denied)
# }

Statistics

from ratelimit import StatsCollector

stats = StatsCollector()
stats.record(key="user:123", allowed=True, latency_ms=1.2)
stats.record(key="user:123", allowed=False, latency_ms=0.8)

summary = stats.get_summary("user:123")
# {"total": 2, "allowed": 1, "denied": 1, "avg_latency_ms": 1.0}

Quota Manager

from ratelimit import QuotaManager

quota = QuotaManager()
quota.set_quota("user:123", hourly=1000, daily=10000, monthly=100000)

result = quota.check("user:123")
print(f"Hourly: {result.hourly_remaining}, Daily: {result.daily_remaining}")

Algorithm Recommendation

from ratelimit.info import recommend_algorithm, list_algorithms

# Get recommendation based on requirements
info = recommend_algorithm(needs_burst=True, memory_constrained=True)
# => GCRA - best all-rounder for production

# List all algorithms with descriptions
for algo in list_algorithms():
    print(f"{algo.algorithm.value}: {algo.name} - {algo.best_for}")

🏗️ Architecture

ratelimit/
├── core.py              # RateLimiter, RateLimitResult, RateLimitConfig, Backend ABC
├── algorithms/
│   ├── token_bucket.py  # Token Bucket algorithm
│   ├── fixed_window.py  # Fixed Window counter
│   ├── sliding_window.py # Sliding Window (Log + Counter)
│   ├── leaky_bucket.py  # Leaky Bucket algorithm
│   ├── gcra.py          # Generic Cell Rate Algorithm
│   └── concurrency.py   # Concurrency Limiter
├── backends/
│   └── memory.py        # In-memory storage backend with TTL
├── factory.py           # create_limiter() one-line factory
├── decorator.py         # @rate_limit decorator (sync + async)
├── context.py           # RateLimitContext, ConcurrencyContext
├── groups.py            # RateLimitGroup, RateLimitChain
├── presets.py           # 10 pre-configured policies
├── circuit.py           # CircuitBreaker (closed/open/half-open)
├── penalty.py           # PenaltyTracker with exponential backoff
├── stats.py             # StatsCollector for per-key metrics
├── estimator.py         # RateEstimator for traffic prediction
├── quota.py             # QuotaManager (hourly/daily/weekly/monthly)
├── events.py            # Event system with async-compatible emitter
├── keys.py              # Key extraction (IP, user, API key, endpoint)
├── retry.py             # Retry strategies (fixed, exponential, retry-after)
├── snapshot.py          # State serialization and debugging
├── weighted.py          # Weighted limiter with priority reserves
├── info.py              # Algorithm introspection and recommender
├── cli.py               # CLI benchmarking tool
└── middleware/           # Framework middleware (extensible)

Request Flow

    Request
      │
      ▼
┌──────────────┐
│  Key Extract │  (IP, user, API key, endpoint)
└──────┬───────┘
       │
       ▼
┌──────────────┐    ┌──────────────┐
│   Penalty    │───▶│   Circuit    │
│   Tracker    │    │   Breaker    │
└──────┬───────┘    └──────┬───────┘
       │                   │
       ▼                   ▼
┌──────────────────────────────┐
│     RateLimiter.acquire()    │
│  ┌────────────────────────┐  │
│  │   Algorithm Engine     │  │
│  │  (Token Bucket, GCRA,  │  │
│  │   Sliding Window, etc) │  │
│  └───────────┬────────────┘  │
│              │               │
│  ┌───────────▼────────────┐  │
│  │   Memory Backend       │  │
│  │   (get/set/increment)  │  │
│  └────────────────────────┘  │
└──────────────┬───────────────┘
               │
      ┌────────┼────────┐
      ▼                 ▼
  Allowed            Denied
      │                 │
      ▼                 ▼
┌──────────┐    ┌──────────────┐
│  Stats   │    │  Retry-After │
│ Record   │    │  + Headers   │
└──────────┘    └──────────────┘

📡 API Reference

Core

from ratelimit import (
    # Algorithms
    TokenBucket, FixedWindow, SlidingWindowLog,
    SlidingWindowCounter, LeakyBucket, GCRA, ConcurrencyLimiter,
    # Backend
    MemoryBackend,
    # Core types
    RateLimiter, RateLimitResult, RateLimitConfig, Algorithm,
    # Decorator
    rate_limit, RateLimitExceeded,
    # Composition
    RateLimitGroup, RateLimitChain,
    # Context managers
    RateLimitContext, ConcurrencyContext,
    # Utilities
    StatsCollector, CircuitBreaker, PenaltyTracker,
    RateEstimator, QuotaManager,
    # Factory & Presets
    create_limiter, get_preset, list_presets,
)

RateLimitResult

result = await limiter.acquire("key")

result.allowed        # bool: was the request allowed?
result.remaining      # int: remaining requests in window
result.limit          # int: total limit
result.reset_at       # float: Unix timestamp when limit resets
result.retry_after    # float: seconds to wait before retrying
result.reset_in       # float: seconds until reset (computed property)
result.to_headers()   # dict: standard HTTP rate limit headers

RateLimiter Methods

# Try to acquire (non-blocking)
result = await limiter.acquire("key", cost=1)

# Check without consuming
result = await limiter.peek("key")

# Reset a key
await limiter.reset("key")

# Wait and acquire (blocking with timeout)
result = await limiter.wait_and_acquire("key", cost=1, timeout=30.0)

# Event callbacks
@limiter.on_limited
def handle_limited(key, result):
    log.warning(f"Rate limited: {key}")

@limiter.on_allowed
def handle_allowed(key, result):
    stats.record(key)

🔧 How It Works

Key Extraction -- Each request is identified by a key (user ID, IP, API key, or custom)
Algorithm Selection -- The configured algorithm determines how requests are counted/tracked
Backend Query -- The algorithm queries the storage backend for current state
Decision -- The algorithm decides allow/deny based on its specific logic
State Update -- On allow, the backend state is updated (decrement tokens, add timestamp, etc.)
Result -- A RateLimitResult is returned with allowed/denied, remaining count, reset time, and retry-after
Headers -- Results can be converted to standard HTTP headers for API responses

🛠️ CLI Benchmarking

# Benchmark an algorithm
python -m ratelimit.cli bench -a token_bucket -r 100 -w 1 -n 1000

# Output:
# Algorithm:         token_bucket
# Limit:             100 / 1.0s
# Total requests:    1000
# Allowed:           100
# Denied:            900
# Elapsed:           0.0123s
# Throughput:        81300.81 req/s

# List all algorithms
python -m ratelimit.cli list

❓ Troubleshooting

Which Algorithm Should I Use?

Use Case	Recommended	Why
General API	Token Bucket or GCRA	Both handle bursts well with O(1) memory
Login protection	Sliding Window Log	Exact counting prevents boundary attacks
Webhook delivery	Leaky Bucket	Smooth constant-rate output
Simple counters	Fixed Window	Simplest to understand and debug
Connection pooling	Concurrency Limiter	Caps parallelism, not rate
Production at scale	GCRA	Battle-tested at Stripe/Shopify

Memory Concerns

Token Bucket, Fixed Window, GCRA, Leaky Bucket: O(1) memory per key
Sliding Window Log: O(n) where n is requests in the window -- use Counter variant for large volumes
Concurrency Limiter: O(n) where n is concurrent operations

Async vs Sync

All APIs are async-first. For sync code, use the decorator which handles the event loop automatically:

@rate_limit(limiter)
def sync_function():  # Works with sync functions too
    pass

🧪 Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run all 416 tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=ratelimit --cov-report=term-missing

# Run specific algorithm tests
pytest tests/test_algorithms/test_token_bucket.py -v
pytest tests/test_algorithms/test_gcra.py -v

# Run integration tests
pytest tests/test_integration/ -v

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsleekr_ratelimit-1.0.0.tar.gz (45.9 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jsleekr_ratelimit-1.0.0-py3-none-any.whl (41.9 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file jsleekr_ratelimit-1.0.0.tar.gz.

File metadata

Download URL: jsleekr_ratelimit-1.0.0.tar.gz
Upload date: Mar 28, 2026
Size: 45.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jsleekr_ratelimit-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bbb1f73c75b7578b27f8713cac6864ab0acbc720141bdca2c41410b4854b1932`
MD5	`5e5a7c00150337e3d1caf4208d791969`
BLAKE2b-256	`a95d3501d49caf7b6528802301ebeb62a5fa730caec79d61df4a72c42eac18e7`

See more details on using hashes here.

File details

Details for the file jsleekr_ratelimit-1.0.0-py3-none-any.whl.

File metadata

Download URL: jsleekr_ratelimit-1.0.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 41.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jsleekr_ratelimit-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2366a7b524c6dc904c9a70ff65c131d3e4db41e4eed8468076f966013b8fb3da`
MD5	`2ac1713cbdc12beb36385abe839c3cc9`
BLAKE2b-256	`1811665a215108e3f304def9f9f51902d07e58e1288fc3d57c6c3200f0f1cbc5`

See more details on using hashes here.

jsleekr-ratelimit 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

⏱️ ratelimit

Multi-algorithm rate limiter with pluggable backends

Why This Exists

Features

🚀 Quick Start

📊 Algorithms

Token Bucket

Fixed Window

Sliding Window Log

Sliding Window Counter

Leaky Bucket

GCRA (Generic Cell Rate Algorithm)

Concurrency Limiter

📋 Usage Patterns

Decorator

Context Manager

Multi-Tier Rate Limiting

Presets

Circuit Breaker

Penalty Tracker

HTTP Headers

Statistics

Quota Manager

Algorithm Recommendation

🏗️ Architecture

Request Flow

📡 API Reference

Core

RateLimitResult

RateLimiter Methods

🔧 How It Works

🛠️ CLI Benchmarking

❓ Troubleshooting

Which Algorithm Should I Use?

Memory Concerns

Async vs Sync

🧪 Testing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes