Prompt-cache observability for LLM APIs. Per-call hit ratios, cost saved, regression alerts, miss-aware retry. Anthropic + OpenAI + Bedrock.
Project description
cachebench
Prompt-cache observability for LLM APIs. Per-call hit ratios, cost saved, regression alerts, miss-aware retry. Anthropic, OpenAI, and AWS Bedrock.
pip install cachebench
Why
Prompt caching saves 50–90% of input tokens on Anthropic and OpenAI, but per-request hit rate is invisible from the SDK. Misses are silent. A single deploy that appends a timestamp to a system prompt can quietly halve your cache hit rate and double your bill — and you'll find out from the invoice. Anthropic's SDK silently misses ~40% on back-to-back requests at certain windows; OpenAI cache mechanics differ across models.
cachebench wraps your client call and tells you, per request, what hit and what didn't.
Quick start
from anthropic import Anthropic
from cachebench import CacheTracker, Provider
client = Anthropic()
tracker = CacheTracker(provider=Provider.ANTHROPIC, miss_alert_threshold=0.6)
create = tracker.wrap(client.messages.create)
response = create(
model="claude-sonnet-4-20250514",
max_tokens=200,
system=[{"type": "text", "text": "...", "cache_control": {"type": "ephemeral"}}],
messages=[{"role": "user", "content": "Hello"}],
)
print(tracker.aggregate())
# {'calls': 1, 'hit_ratio': 0.94, 'cost_saved_usd': 0.012, ...}
Features
- Per-call attribution. Every wrapped call records
cache_read_tokens,cache_creation_tokens,hit_ratio,cost_saved_usd,prefix_id(stable hash of the cacheable prefix). - Regression alerts. Configurable threshold; default fires on stderr when a request with a cacheable prefix hits below 60%. Pass
on_miss_alert=to forward to your own logger / Slack / PagerDuty. - Miss-aware retry. Opt in with
CachePolicy.miss_aware()to retry once on a silent miss after a configurable delay (works around Anthropic's documented eventual-consistency window). - Per-prefix grouping.
tracker.by_prefix()shows hit rate per stable prefix — instantly tells you which system prompt regressed. - Multi-provider. Anthropic, OpenAI (via
prompt_tokens_details.cached_tokens), Bedrock (AnthropicBedrockclient). - Async-aware.
tracker.wrapdetects coroutines and wraps both sync and async paths.
Recipes
Alert to Slack on regression
import requests
def to_slack(m):
requests.post(SLACK_URL, json={"text": f"Cache regression: {m.prefix_id} ratio={m.hit_ratio:.2f}"})
tracker = CacheTracker(provider=Provider.ANTHROPIC, on_miss_alert=to_slack)
Retry around the Anthropic 40% miss bug
from cachebench import CachePolicy
tracker = CacheTracker(
provider=Provider.ANTHROPIC,
policy=CachePolicy.miss_aware(delay_ms=2000, max_retries=1),
)
Find which prefix regressed
for prefix_id, stats in tracker.by_prefix().items():
if stats["hit_ratio"] < 0.5:
print(f"REGRESSED: {prefix_id} {stats}")
What it doesn't do
- Not a proxy. Not a router. Not a cache itself — it observes the provider's cache, doesn't store responses.
- Not a billing dashboard. Exports metrics; aggregation/UI is your job.
- Doesn't modify prompts (no auto-injecting cache breakpoints — see other tools for that).
Pricing data
DEFAULT_PRICING ships with current Anthropic/OpenAI/Bedrock rates. Pricing changes; pass pricing= to override:
tracker = CacheTracker(
provider=Provider.ANTHROPIC,
pricing={"input": 3.00, "cache_read": 0.30, "cache_write_5m": 3.75, "cache_write_1h": 6.00, "output": 15.00},
)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachebench-0.1.0.tar.gz.
File metadata
- Download URL: cachebench-0.1.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dd193458dca86664dea54b1373f8553edd61b4f92335369ad2d75fde0fcf64d
|
|
| MD5 |
04c802edfbf257ba93e331a3da0fffbf
|
|
| BLAKE2b-256 |
f288c4798cffbea5a987e0559aae6aa3c5fd71e49a861adc2e8dcfdee9b14f92
|
File details
Details for the file cachebench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cachebench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c16a23b3c30779c32fc3a54aa7c40bdfe8c38709ac69a70e946f1abbc47b721d
|
|
| MD5 |
43cc682eb8b13d476dc326c040bc1e0b
|
|
| BLAKE2b-256 |
4e0f5b665e571680e971bb7e5a84006b92807015d7a0dacce22be2e93965d203
|