Skip to main content

Adaptive hedged request library for Python. Learns per-host latency via DDSketch, fires backup requests at estimated p90, caps hedge rate with token bucket.

Project description

hedge-python

English | 简体中文 | 日本語

CI Coverage Python License: MIT

Python port of bhope/hedgeadaptive hedged requests for tail-latency optimisation.

hedge-python learns per-host latency distributions with DDSketch, races a backup request when the primary exceeds its estimated p90, and caps the hedge rate with a token bucket to prevent load amplification during outages. Zero configuration required. First-class support for httpx, aiohttp, niquests, tornado, and gRPC (unary + server-streaming). Works out of the box with OpenAI's Python SDK.

Inspired by Dean & Barroso, The Tail at Scale (CACM 2013).


Why hedging?

A small fraction of slow responses dominates user-perceived latency. Hedging fires a duplicate request after the primary blows past its expected deadline — whichever finishes first wins, the other is cancelled.

Result on a benchmark with 5% straggler requests (10× slower):

Multi-framework benchmark

Framework Configuration p50 p90 p95 p99 p999 Overhead
httpx No hedging 5.8 10.3 12.2 51.3 78.3 0.0%
httpx Adaptive (hedge) 6.2 10.5 12.1 18.8 22.2 7.0%
aiohttp No hedging 6.3 10.7 13.0 52.4 79.0 0.0%
aiohttp Adaptive (hedge) 6.5 11.3 13.8 20.5 25.1 4.6%
grpc No hedging 6.5 10.8 12.7 59.9 82.0 0.0%
grpc Adaptive (hedge) 6.9 11.6 13.7 20.4 23.5 5.6%

Across all three frameworks, p99 latency drops by 60–66% at the cost of ~5–7% extra backend traffic. Reproduce with make bench-multi && make bench-plot.


Quick Start

# Install with your preferred framework
pip install hedge-python[httpx]
pip install hedge-python[aiohttp]
pip install hedge-python[niquests]
pip install hedge-python[tornado]
pip install hedge-python[grpc]
pip install hedge-python[all]   # all frameworks

httpx

import asyncio
import httpx
from hedge import HedgeConfig
from hedge.transport import HedgedHttpxTransport

async def main():
    transport = HedgedHttpxTransport(config=HedgeConfig())
    async with httpx.AsyncClient(transport=transport) as client:
        resp = await client.get("https://api.example.com/data")
        print(resp.status_code)

asyncio.run(main())

aiohttp

import asyncio
from hedge import HedgeConfig
from hedge.transport import HedgedAiohttpSession

async def main():
    async with HedgedAiohttpSession(config=HedgeConfig()) as session:
        resp = await session.get("https://api.example.com/data")
        data = await resp.json()
        print(data)

asyncio.run(main())

gRPC (Unary)

import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedUnaryInterceptor

async def make_channel():
    return grpc.aio.insecure_channel(
        "localhost:50051",
        interceptors=[HedgedUnaryInterceptor(config=HedgeConfig(estimated_rps=500))],
    )

gRPC (Server Streaming — LLM inference, log tailing, …)

import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedServerStreamInterceptor

async def make_channel():
    return grpc.aio.insecure_channel(
        "localhost:50051",
        interceptors=[HedgedServerStreamInterceptor(config=HedgeConfig())],
    )

niquests

import asyncio
from hedge import HedgeConfig
from hedge.transport import HedgedNiquestsSession

async def main():
    async with HedgedNiquestsSession(config=HedgeConfig()) as session:
        resp = await session.get("https://api.example.com/data")
        print(resp.status_code)

asyncio.run(main())

tornado

import asyncio
from hedge import HedgeConfig
from hedge.transport import HedgedTornadoClient

async def main():
    async with HedgedTornadoClient(config=HedgeConfig()) as client:
        resp = await client.fetch("https://api.example.com/data")
        print(resp.code)

asyncio.run(main())

OpenAI SDK

Since the OpenAI Python SDK uses httpx under the hood, you can inject HedgedHttpxTransport directly via the http_client parameter:

import httpx
from openai import AsyncOpenAI
from hedge import HedgeConfig
from hedge.transport import HedgedHttpxTransport

transport = HedgedHttpxTransport(config=HedgeConfig(percentile=0.95))
client = AsyncOpenAI(
    api_key="sk-...",
    http_client=httpx.AsyncClient(transport=transport),
)

Note: OpenAI's core APIs (Chat Completions, Embeddings, etc.) use POST, so they are not hedged by default — avoiding double billing. Only GET endpoints (e.g. model listing) are hedged. See examples/openai_hedged.py for a full example.

For server streaming, the hedge signal is time-to-first-message (TTFM): if the primary stream doesn't yield its first chunk within the estimated p90, a backup stream is started. Whichever yields first wins and continues streaming; the loser is cancelled at the wire level.

Runnable examples for each framework live in examples/ — the gRPC ones are fully self-contained (they spin up a local server with simulated stragglers so you can see hedging in action without any external dependency). See examples/README.md for the index.


How It Works

1. DDSketch quantile estimator

Each target host gets a WindowedSketch — a pair of DDSketches that rotate every 30 seconds. DDSketch uses logarithmic bucket mapping to provide relative-error guarantees: any quantile estimate is within ±1% of the true value, regardless of the underlying distribution.

2. Adaptive trigger

On each request, the transport queries the sketch for the configured percentile (default p90). If the primary hasn't responded by that deadline, a backup request is fired. Whichever response arrives first is returned; the loser is cancelled (including the underlying gRPC Call for streams).

              ┌─ primary  ─────────── ✓ (fast) ──→ return
request ──────┤
              └─ hedge fires after p90 ─── ✗ (cancelled)

3. Token bucket budget

Hedges are rate-limited by a token bucket that refills at estimated_rps × budget_percent / 100 tokens per second. During genuine outages the bucket drains and hedging stops automatically — preventing the load-doubling spiral that would deepen the incident.

gRPC implementation note

The gRPC intercept_unary_unary continuation returns a Call object almost immediately; the real RTT is spent in the subsequent await call. We wrap both steps in a single asyncio task so the hedge timer reflects true end-to-end RPC latency. Cancelling a loser invokes call.cancel() first (notifying the server) then task.cancel() (cleaning up the coroutine).


Configuration

All knobs live on HedgeConfig:

Parameter Type Default Description
percentile float 0.90 Sketch quantile used as hedge trigger
max_hedges int 1 Maximum concurrent hedge requests per call
budget_percent float 10.0 Max hedge rate as percent of total traffic
estimated_rps float 100.0 Expected requests per second; sets token bucket capacity
min_delay float 0.001 Floor on the hedge delay in seconds
warmup_requests int 20 Number of initial requests using fixed delay
warmup_delay float 0.01 Fixed hedge delay during warmup in seconds
window_duration float 30.0 Sketch window rotation interval in seconds
stats Stats | None None Inject a custom Stats for observability

Tip — estimated_rps: pick a value close to your real RPS so the token bucket capacity (rps × budget_percent / 100) is meaningful. If unsure, start at the default 100.0 and watch hedge_rate / budget_exhausted in the stats snapshot.


Observability

from hedge import HedgeConfig, Stats
from hedge.transport import HedgedHttpxTransport

stats = Stats()
transport = HedgedHttpxTransport(config=HedgeConfig(stats=stats))

# ... after running some traffic ...
snap = stats.snapshot()
print(f"total={snap.total_requests} hedged={snap.hedged_requests}")
print(f"hedge_wins={snap.hedge_wins} primary_wins={snap.primary_wins}")
print(f"budget_exhausted={snap.budget_exhausted}")
print(f"hedge_rate={stats.hedge_rate():.2%}")

Stats is fully thread-safe and can be shared across multiple transports/interceptors to aggregate metrics.


Benchmarks & charts

Two benchmark suites ship with the project:

Command What it does Output
make bench-compare httpx only: No hedging vs Static 10ms vs Static 50ms vs Adaptive benchmark/results.csv
make bench-multi httpx vs aiohttp vs gRPC, No hedging vs Adaptive benchmark/results_multi.csv
make bench-plot Render both CSVs into charts eval.png, eval_multi_framework.png

Each suite runs 500 requests against a simulated lognormal latency (mean=5ms, stddev=2ms) with 5% straggler probability (10× spike).


Development

# Install uv (if not already)
curl -LsSf https://astral.sh/uv/install.sh | sh

make install            # install all extras with uv
make lint               # ruff check
make typecheck          # mypy
make test               # all tests
make test-unit          # unit tests only
make test-integration   # integration tests (requires httpx / aiohttp / grpcio)
make coverage           # coverage report (current: 96%)
make bench-multi        # multi-framework benchmark
make bench-plot         # render charts
make ci                 # lint + typecheck + test + coverage

Testing

  • Unit tests (tests/unit/): DDSketch, token bucket, scheduler, stats, options, lazy import shims, gRPC interceptor branches (with fake continuations).
  • Integration tests (tests/integration/): real httpx transport, real aiohttp session, real local gRPC server with .proto + generated pb2.
  • Benchmarks (tests/benchmark/): DDSketch microbench, token bucket microbench, four-config comparison, three-framework comparison.

Current coverage: 97% (150 tests, ~7 seconds).


Project Structure

hedge-python/
├── src/hedge/
│   ├── __init__.py          # Public API
│   ├── _options.py          # HedgeConfig dataclass
│   ├── _stats.py            # Thread-safe Stats + StatsSnapshot
│   ├── sketch/
│   │   ├── _ddsketch.py     # DDSketch quantile estimator
│   │   └── _windowed.py     # Sliding-window DDSketch pair
│   ├── budget/
│   │   └── _token_bucket.py # Token bucket rate limiter
│   ├── transport/
│   │   ├── _base.py         # Shared HedgeScheduler logic
│   │   ├── _httpx.py        # httpx AsyncBaseTransport adapter
│   │   ├── _aiohttp.py      # aiohttp session wrapper
│   │   ├── _niquests.py     # niquests session wrapper
│   │   └── _tornado.py      # tornado AsyncHTTPClient wrapper
│   └── interceptor/
│       └── _grpc.py         # gRPC unary + server-stream interceptors
├── tests/
│   ├── unit/                # 7 unit-test files
│   ├── integration/
│   │   ├── proto/           # .proto + generated pb2 / pb2_grpc
│   │   ├── test_httpx_transport.py
│   │   ├── test_aiohttp_session.py
│   │   └── test_grpc_interceptor.py
│   └── benchmark/
│       ├── test_bench_ddsketch.py
│       ├── test_bench_token_bucket.py
│       ├── test_bench_hedge_comparison.py    # httpx 4-config
│       └── test_bench_multi_framework.py     # 3-framework comparison
├── benchmark/
│   ├── plot.py              # CSV → matplotlib charts
│   ├── results.csv          # produced by bench-compare
│   └── results_multi.csv    # produced by bench-multi
├── eval.png                 # single-framework chart
├── eval_multi_framework.png # cross-framework chart
├── pyproject.toml
├── Makefile
└── .github/workflows/ci.yml

References

Changelog

See CHANGELOG.md for the full release history.

License

hedge-python is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hedge_python-0.2.0.tar.gz (485.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hedge_python-0.2.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file hedge_python-0.2.0.tar.gz.

File metadata

  • Download URL: hedge_python-0.2.0.tar.gz
  • Upload date:
  • Size: 485.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hedge_python-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fd224b1fc2caf5906d7af9c3ec4fe756835bb1efee4a41bef4bba1805d8e3773
MD5 a78d27e36adbca7e41deaafe0dc6b8f3
BLAKE2b-256 c75259b28e9eef569bf9359717efca070fba5f352896ff021744e933d75ce57c

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_python-0.2.0.tar.gz:

Publisher: release.yml on sunhailin-Leo/hedge-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hedge_python-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: hedge_python-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hedge_python-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8a6768e2ad95c4b401b6a6616641bda4ef8aff6fff6c131decbe71a90377152
MD5 872ad44a013e97ef7f12343113c428b2
BLAKE2b-256 f10818409fa7ddfa3fa720624cb62a6d4113a096bb5b1db3ad80df6560420b0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_python-0.2.0-py3-none-any.whl:

Publisher: release.yml on sunhailin-Leo/hedge-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page