Adaptive hedged request library for Python. Learns per-host latency via DDSketch, fires backup requests at estimated p90, caps hedge rate with token bucket.
Project description
hedge-python
Python port of bhope/hedge — adaptive hedged requests for tail-latency optimisation.
hedge-python learns per-host latency distributions with
DDSketch, races a backup request when the
primary exceeds its estimated p90, and caps the hedge rate with a token bucket
to prevent load amplification during outages. Zero configuration required.
First-class support for httpx, aiohttp, and gRPC (unary +
server-streaming).
Inspired by Dean & Barroso, The Tail at Scale (CACM 2013).
Why hedging?
A small fraction of slow responses dominates user-perceived latency. Hedging fires a duplicate request after the primary blows past its expected deadline — whichever finishes first wins, the other is cancelled.
Result on a benchmark with 5% straggler requests (10× slower):
| Framework | Configuration | p50 | p90 | p95 | p99 | p999 | Overhead |
|---|---|---|---|---|---|---|---|
| httpx | No hedging | 5.8 | 10.3 | 12.2 | 51.3 | 78.3 | 0.0% |
| httpx | Adaptive (hedge) | 6.2 | 10.5 | 12.1 | 18.8 | 22.2 | 7.0% |
| aiohttp | No hedging | 6.3 | 10.7 | 13.0 | 52.4 | 79.0 | 0.0% |
| aiohttp | Adaptive (hedge) | 6.5 | 11.3 | 13.8 | 20.5 | 25.1 | 4.6% |
| grpc | No hedging | 6.5 | 10.8 | 12.7 | 59.9 | 82.0 | 0.0% |
| grpc | Adaptive (hedge) | 6.9 | 11.6 | 13.7 | 20.4 | 23.5 | 5.6% |
Across all three frameworks, p99 latency drops by 60–66% at the cost of
~5–7% extra backend traffic. Reproduce with make bench-multi && make bench-plot.
Quick Start
# Install with your preferred framework
pip install hedge-python[httpx]
pip install hedge-python[aiohttp]
pip install hedge-python[grpc]
pip install hedge-python[all] # all frameworks
httpx
import asyncio
import httpx
from hedge import HedgeConfig
from hedge.transport import HedgedHttpxTransport
async def main():
transport = HedgedHttpxTransport(config=HedgeConfig())
async with httpx.AsyncClient(transport=transport) as client:
resp = await client.get("https://api.example.com/data")
print(resp.status_code)
asyncio.run(main())
aiohttp
import asyncio
from hedge import HedgeConfig
from hedge.transport import HedgedAiohttpSession
async def main():
async with HedgedAiohttpSession(config=HedgeConfig()) as session:
resp = await session.get("https://api.example.com/data")
data = await resp.json()
print(data)
asyncio.run(main())
gRPC (Unary)
import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedUnaryInterceptor
async def make_channel():
return grpc.aio.insecure_channel(
"localhost:50051",
interceptors=[HedgedUnaryInterceptor(config=HedgeConfig(estimated_rps=500))],
)
gRPC (Server Streaming — LLM inference, log tailing, …)
import grpc.aio
from hedge import HedgeConfig
from hedge.interceptor import HedgedServerStreamInterceptor
async def make_channel():
return grpc.aio.insecure_channel(
"localhost:50051",
interceptors=[HedgedServerStreamInterceptor(config=HedgeConfig())],
)
For server streaming, the hedge signal is time-to-first-message (TTFM): if the primary stream doesn't yield its first chunk within the estimated p90, a backup stream is started. Whichever yields first wins and continues streaming; the loser is cancelled at the wire level.
Runnable examples for each framework live in
examples/— the gRPC ones are fully self-contained (they spin up a local server with simulated stragglers so you can see hedging in action without any external dependency). Seeexamples/README.mdfor the index.
How It Works
1. DDSketch quantile estimator
Each target host gets a WindowedSketch — a pair of DDSketches that rotate
every 30 seconds. DDSketch uses logarithmic bucket mapping to provide
relative-error guarantees: any quantile estimate is within ±1% of the
true value, regardless of the underlying distribution.
2. Adaptive trigger
On each request, the transport queries the sketch for the configured
percentile (default p90). If the primary hasn't responded by that deadline,
a backup request is fired. Whichever response arrives first is returned;
the loser is cancelled (including the underlying gRPC Call for streams).
┌─ primary ─────────── ✓ (fast) ──→ return
request ──────┤
└─ hedge fires after p90 ─── ✗ (cancelled)
3. Token bucket budget
Hedges are rate-limited by a token bucket that refills at
estimated_rps × budget_percent / 100 tokens per second. During genuine
outages the bucket drains and hedging stops automatically — preventing the
load-doubling spiral that would deepen the incident.
gRPC implementation note
The gRPC intercept_unary_unary continuation returns a Call object almost
immediately; the real RTT is spent in the subsequent await call. We wrap
both steps in a single asyncio task so the hedge timer reflects true
end-to-end RPC latency. Cancelling a loser invokes call.cancel() first
(notifying the server) then task.cancel() (cleaning up the coroutine).
Configuration
All knobs live on HedgeConfig:
| Parameter | Type | Default | Description |
|---|---|---|---|
percentile |
float |
0.90 |
Sketch quantile used as hedge trigger |
max_hedges |
int |
1 |
Maximum concurrent hedge requests per call |
budget_percent |
float |
10.0 |
Max hedge rate as percent of total traffic |
estimated_rps |
float |
100.0 |
Expected requests per second; sets token bucket capacity |
min_delay |
float |
0.001 |
Floor on the hedge delay in seconds |
warmup_requests |
int |
20 |
Number of initial requests using fixed delay |
warmup_delay |
float |
0.01 |
Fixed hedge delay during warmup in seconds |
window_duration |
float |
30.0 |
Sketch window rotation interval in seconds |
stats |
Stats | None |
None |
Inject a custom Stats for observability |
Tip —
estimated_rps: pick a value close to your real RPS so the token bucket capacity (rps × budget_percent / 100) is meaningful. If unsure, start at the default100.0and watchhedge_rate/budget_exhaustedin the stats snapshot.
Observability
from hedge import HedgeConfig, Stats
from hedge.transport import HedgedHttpxTransport
stats = Stats()
transport = HedgedHttpxTransport(config=HedgeConfig(stats=stats))
# ... after running some traffic ...
snap = stats.snapshot()
print(f"total={snap.total_requests} hedged={snap.hedged_requests}")
print(f"hedge_wins={snap.hedge_wins} primary_wins={snap.primary_wins}")
print(f"budget_exhausted={snap.budget_exhausted}")
print(f"hedge_rate={stats.hedge_rate():.2%}")
Stats is fully thread-safe and can be shared across multiple
transports/interceptors to aggregate metrics.
Benchmarks & charts
Two benchmark suites ship with the project:
| Command | What it does | Output |
|---|---|---|
make bench-compare |
httpx only: No hedging vs Static 10ms vs Static 50ms vs Adaptive | benchmark/results.csv |
make bench-multi |
httpx vs aiohttp vs gRPC, No hedging vs Adaptive | benchmark/results_multi.csv |
make bench-plot |
Render both CSVs into charts | eval.png, eval_multi_framework.png |
Each suite runs 500 requests against a simulated lognormal latency
(mean=5ms, stddev=2ms) with 5% straggler probability (10× spike).
Development
# Install uv (if not already)
curl -LsSf https://astral.sh/uv/install.sh | sh
make install # install all extras with uv
make lint # ruff check
make typecheck # mypy
make test # all tests
make test-unit # unit tests only
make test-integration # integration tests (requires httpx / aiohttp / grpcio)
make coverage # coverage report (current: 96%)
make bench-multi # multi-framework benchmark
make bench-plot # render charts
make ci # lint + typecheck + test + coverage
Testing
- Unit tests (
tests/unit/): DDSketch, token bucket, scheduler, stats, options, lazy import shims, gRPC interceptor branches (with fake continuations). - Integration tests (
tests/integration/): real httpx transport, real aiohttp session, real local gRPC server with.proto+ generated pb2. - Benchmarks (
tests/benchmark/): DDSketch microbench, token bucket microbench, four-config comparison, three-framework comparison.
Current coverage: 97% (122 tests, ~7 seconds).
Project Structure
hedge-python/
├── src/hedge/
│ ├── __init__.py # Public API
│ ├── _options.py # HedgeConfig dataclass
│ ├── _stats.py # Thread-safe Stats + StatsSnapshot
│ ├── sketch/
│ │ ├── _ddsketch.py # DDSketch quantile estimator
│ │ └── _windowed.py # Sliding-window DDSketch pair
│ ├── budget/
│ │ └── _token_bucket.py # Token bucket rate limiter
│ ├── transport/
│ │ ├── _base.py # Shared HedgeScheduler logic
│ │ ├── _httpx.py # httpx AsyncBaseTransport adapter
│ │ └── _aiohttp.py # aiohttp session wrapper
│ └── interceptor/
│ └── _grpc.py # gRPC unary + server-stream interceptors
├── tests/
│ ├── unit/ # 7 unit-test files
│ ├── integration/
│ │ ├── proto/ # .proto + generated pb2 / pb2_grpc
│ │ ├── test_httpx_transport.py
│ │ ├── test_aiohttp_session.py
│ │ └── test_grpc_interceptor.py
│ └── benchmark/
│ ├── test_bench_ddsketch.py
│ ├── test_bench_token_bucket.py
│ ├── test_bench_hedge_comparison.py # httpx 4-config
│ └── test_bench_multi_framework.py # 3-framework comparison
├── benchmark/
│ ├── plot.py # CSV → matplotlib charts
│ ├── results.csv # produced by bench-compare
│ └── results_multi.csv # produced by bench-multi
├── eval.png # single-framework chart
├── eval_multi_framework.png # cross-framework chart
├── pyproject.toml
├── Makefile
└── .github/workflows/ci.yml
References
- Jeffrey Dean and Luiz André Barroso. "The Tail at Scale." Communications of the ACM, 56(2):74–80, February 2013.
- Charles Masson, Jee E. Rim, and Homin K. Lee. "DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees." Proceedings of the VLDB Endowment, 12(12):2195–2205, 2019.
Changelog
See CHANGELOG.md for the full release history.
License
hedge-python is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hedge_python-0.1.0.tar.gz.
File metadata
- Download URL: hedge_python-0.1.0.tar.gz
- Upload date:
- Size: 445.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d7ed373836d76bb54d684331ad86c9a5d8eb947f7db1bd895bae712311ae680
|
|
| MD5 |
ef2528dcb33176ea81cb4cebeb1ac99f
|
|
| BLAKE2b-256 |
4a00e99af93944b704e59ff0b9cb26a21fd9dfb82ab1709a09a3d465a68b5bb5
|
Provenance
The following attestation bundles were made for hedge_python-0.1.0.tar.gz:
Publisher:
release.yml on sunhailin-Leo/hedge-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hedge_python-0.1.0.tar.gz -
Subject digest:
7d7ed373836d76bb54d684331ad86c9a5d8eb947f7db1bd895bae712311ae680 - Sigstore transparency entry: 1364573616
- Sigstore integration time:
-
Permalink:
sunhailin-Leo/hedge-python@ee9d79837d12c397f1455cd0832cd3611a63008a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sunhailin-Leo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ee9d79837d12c397f1455cd0832cd3611a63008a -
Trigger Event:
push
-
Statement type:
File details
Details for the file hedge_python-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hedge_python-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c65972b563671e19017c83a85049dcf3f01928671f148fc3a163ed303392ec5
|
|
| MD5 |
08567a3a40a702007c6778d909f413cd
|
|
| BLAKE2b-256 |
5df7599178eb585f3b1d6b755dc823349ed12634f493d5f965fac588fc209376
|
Provenance
The following attestation bundles were made for hedge_python-0.1.0-py3-none-any.whl:
Publisher:
release.yml on sunhailin-Leo/hedge-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hedge_python-0.1.0-py3-none-any.whl -
Subject digest:
7c65972b563671e19017c83a85049dcf3f01928671f148fc3a163ed303392ec5 - Sigstore transparency entry: 1364573703
- Sigstore integration time:
-
Permalink:
sunhailin-Leo/hedge-python@ee9d79837d12c397f1455cd0832cd3611a63008a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/sunhailin-Leo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ee9d79837d12c397f1455cd0832cd3611a63008a -
Trigger Event:
push
-
Statement type: