Skip to main content

Adaptive hedged request library for Python. Learns per-host latency via DDSketch, fires backup requests at estimated p90, caps hedge rate with token bucket.

Project description

hedge-python

English | 简体中文 | 日本語

CI Coverage Python License: MIT

Python port of bhope/hedgeadaptive hedged requests for tail-latency optimisation.

hedge-python learns per-host latency distributions with DDSketch, races a backup request when the primary exceeds its estimated p90, and caps the hedge rate with a token bucket to prevent load amplification during outages. Zero configuration required. First-class support for httpx, aiohttp, and gRPC (unary + server-streaming).

Inspired by Dean & Barroso, The Tail at Scale (CACM 2013).


Why hedging?

A small fraction of slow responses dominates user-perceived latency. Hedging fires a duplicate request after the primary blows past its expected deadline — whichever finishes first wins, the other is cancelled.

Result on a benchmark with 5% straggler requests (10× slower):

Multi-framework benchmark

Framework Configuration p50 p90 p95 p99 p999 Overhead
httpx No hedging 5.8 10.3 12.2 51.3 78.3 0.0%
httpx Adaptive (hedge) 6.2 10.5 12.1 18.8 22.2 7.0%
aiohttp No hedging 6.3 10.7 13.0 52.4 79.0 0.0%
aiohttp Adaptive (hedge) 6.5 11.3 13.8 20.5 25.1 4.6%
grpc No hedging 6.5 10.8 12.7 59.9 82.0 0.0%
grpc Adaptive (hedge) 6.9 11.6 13.7 20.4 23.5 5.6%

Across all three frameworks, p99 latency drops by 60–66% at the cost of ~5–7% extra backend traffic. Reproduce with make bench-multi && make bench-plot.


Quick Start

# Install with your preferred framework
pip install hedge-python[httpx]
pip install hedge-python[aiohttp]
pip install hedge-python[grpc]
pip install hedge-python[all]   # all frameworks

httpx

import asyncio
import httpx
from hedge import HedgeConfig
from hedge.transport import HedgedHttpxTransport

async def main():
    transport = HedgedHttpxTransport(config=HedgeConfig())
    async with httpx.AsyncClient(transport=transport) as client:
        resp = await client.get("https://api.example.com/data")
        print(resp.status_code)

asyncio.run(main())

aiohttp

import asyncio
from hedge import HedgeConfig
from hedge.transport import HedgedAiohttpSession

async def main():
    async with HedgedAiohttpSession(config=HedgeConfig()) as session:
        resp = await session.get("https://api.example.com/data")
        data = await resp.json()
        print(data)

asyncio.run(main())

gRPC (Unary)

import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedUnaryInterceptor

async def make_channel():
    return grpc.aio.insecure_channel(
        "localhost:50051",
        interceptors=[HedgedUnaryInterceptor(config=HedgeConfig(estimated_rps=500))],
    )

gRPC (Server Streaming — LLM inference, log tailing, …)

import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedServerStreamInterceptor

async def make_channel():
    return grpc.aio.insecure_channel(
        "localhost:50051",
        interceptors=[HedgedServerStreamInterceptor(config=HedgeConfig())],
    )

For server streaming, the hedge signal is time-to-first-message (TTFM): if the primary stream doesn't yield its first chunk within the estimated p90, a backup stream is started. Whichever yields first wins and continues streaming; the loser is cancelled at the wire level.

Runnable examples for each framework live in examples/ — the gRPC ones are fully self-contained (they spin up a local server with simulated stragglers so you can see hedging in action without any external dependency). See examples/README.md for the index.


How It Works

1. DDSketch quantile estimator

Each target host gets a WindowedSketch — a pair of DDSketches that rotate every 30 seconds. DDSketch uses logarithmic bucket mapping to provide relative-error guarantees: any quantile estimate is within ±1% of the true value, regardless of the underlying distribution.

2. Adaptive trigger

On each request, the transport queries the sketch for the configured percentile (default p90). If the primary hasn't responded by that deadline, a backup request is fired. Whichever response arrives first is returned; the loser is cancelled (including the underlying gRPC Call for streams).

              ┌─ primary  ─────────── ✓ (fast) ──→ return
request ──────┤
              └─ hedge fires after p90 ─── ✗ (cancelled)

3. Token bucket budget

Hedges are rate-limited by a token bucket that refills at estimated_rps × budget_percent / 100 tokens per second. During genuine outages the bucket drains and hedging stops automatically — preventing the load-doubling spiral that would deepen the incident.

gRPC implementation note

The gRPC intercept_unary_unary continuation returns a Call object almost immediately; the real RTT is spent in the subsequent await call. We wrap both steps in a single asyncio task so the hedge timer reflects true end-to-end RPC latency. Cancelling a loser invokes call.cancel() first (notifying the server) then task.cancel() (cleaning up the coroutine).


Configuration

All knobs live on HedgeConfig:

Parameter Type Default Description
percentile float 0.90 Sketch quantile used as hedge trigger
max_hedges int 1 Maximum concurrent hedge requests per call
budget_percent float 10.0 Max hedge rate as percent of total traffic
estimated_rps float 100.0 Expected requests per second; sets token bucket capacity
min_delay float 0.001 Floor on the hedge delay in seconds
warmup_requests int 20 Number of initial requests using fixed delay
warmup_delay float 0.01 Fixed hedge delay during warmup in seconds
window_duration float 30.0 Sketch window rotation interval in seconds
stats Stats | None None Inject a custom Stats for observability

Tip — estimated_rps: pick a value close to your real RPS so the token bucket capacity (rps × budget_percent / 100) is meaningful. If unsure, start at the default 100.0 and watch hedge_rate / budget_exhausted in the stats snapshot.


Observability

from hedge import HedgeConfig, Stats
from hedge.transport import HedgedHttpxTransport

stats = Stats()
transport = HedgedHttpxTransport(config=HedgeConfig(stats=stats))

# ... after running some traffic ...
snap = stats.snapshot()
print(f"total={snap.total_requests} hedged={snap.hedged_requests}")
print(f"hedge_wins={snap.hedge_wins} primary_wins={snap.primary_wins}")
print(f"budget_exhausted={snap.budget_exhausted}")
print(f"hedge_rate={stats.hedge_rate():.2%}")

Stats is fully thread-safe and can be shared across multiple transports/interceptors to aggregate metrics.


Benchmarks & charts

Two benchmark suites ship with the project:

Command What it does Output
make bench-compare httpx only: No hedging vs Static 10ms vs Static 50ms vs Adaptive benchmark/results.csv
make bench-multi httpx vs aiohttp vs gRPC, No hedging vs Adaptive benchmark/results_multi.csv
make bench-plot Render both CSVs into charts eval.png, eval_multi_framework.png

Each suite runs 500 requests against a simulated lognormal latency (mean=5ms, stddev=2ms) with 5% straggler probability (10× spike).


Development

# Install uv (if not already)
curl -LsSf https://astral.sh/uv/install.sh | sh

make install            # install all extras with uv
make lint               # ruff check
make typecheck          # mypy
make test               # all tests
make test-unit          # unit tests only
make test-integration   # integration tests (requires httpx / aiohttp / grpcio)
make coverage           # coverage report (current: 96%)
make bench-multi        # multi-framework benchmark
make bench-plot         # render charts
make ci                 # lint + typecheck + test + coverage

Testing

  • Unit tests (tests/unit/): DDSketch, token bucket, scheduler, stats, options, lazy import shims, gRPC interceptor branches (with fake continuations).
  • Integration tests (tests/integration/): real httpx transport, real aiohttp session, real local gRPC server with .proto + generated pb2.
  • Benchmarks (tests/benchmark/): DDSketch microbench, token bucket microbench, four-config comparison, three-framework comparison.

Current coverage: 97% (122 tests, ~7 seconds).


Project Structure

hedge-python/
├── src/hedge/
│   ├── __init__.py          # Public API
│   ├── _options.py          # HedgeConfig dataclass
│   ├── _stats.py            # Thread-safe Stats + StatsSnapshot
│   ├── sketch/
│   │   ├── _ddsketch.py     # DDSketch quantile estimator
│   │   └── _windowed.py     # Sliding-window DDSketch pair
│   ├── budget/
│   │   └── _token_bucket.py # Token bucket rate limiter
│   ├── transport/
│   │   ├── _base.py         # Shared HedgeScheduler logic
│   │   ├── _httpx.py        # httpx AsyncBaseTransport adapter
│   │   └── _aiohttp.py      # aiohttp session wrapper
│   └── interceptor/
│       └── _grpc.py         # gRPC unary + server-stream interceptors
├── tests/
│   ├── unit/                # 7 unit-test files
│   ├── integration/
│   │   ├── proto/           # .proto + generated pb2 / pb2_grpc
│   │   ├── test_httpx_transport.py
│   │   ├── test_aiohttp_session.py
│   │   └── test_grpc_interceptor.py
│   └── benchmark/
│       ├── test_bench_ddsketch.py
│       ├── test_bench_token_bucket.py
│       ├── test_bench_hedge_comparison.py    # httpx 4-config
│       └── test_bench_multi_framework.py     # 3-framework comparison
├── benchmark/
│   ├── plot.py              # CSV → matplotlib charts
│   ├── results.csv          # produced by bench-compare
│   └── results_multi.csv    # produced by bench-multi
├── eval.png                 # single-framework chart
├── eval_multi_framework.png # cross-framework chart
├── pyproject.toml
├── Makefile
└── .github/workflows/ci.yml

References

Changelog

See CHANGELOG.md for the full release history.

License

hedge-python is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hedge_python-0.1.0.tar.gz (445.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hedge_python-0.1.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file hedge_python-0.1.0.tar.gz.

File metadata

  • Download URL: hedge_python-0.1.0.tar.gz
  • Upload date:
  • Size: 445.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hedge_python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d7ed373836d76bb54d684331ad86c9a5d8eb947f7db1bd895bae712311ae680
MD5 ef2528dcb33176ea81cb4cebeb1ac99f
BLAKE2b-256 4a00e99af93944b704e59ff0b9cb26a21fd9dfb82ab1709a09a3d465a68b5bb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_python-0.1.0.tar.gz:

Publisher: release.yml on sunhailin-Leo/hedge-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hedge_python-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hedge_python-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hedge_python-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c65972b563671e19017c83a85049dcf3f01928671f148fc3a163ed303392ec5
MD5 08567a3a40a702007c6778d909f413cd
BLAKE2b-256 5df7599178eb585f3b1d6b755dc823349ed12634f493d5f965fac588fc209376

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_python-0.1.0-py3-none-any.whl:

Publisher: release.yml on sunhailin-Leo/hedge-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page