Non-invasive prompt cache instrumentation for LLM API apps
Project description
CacheLens
Non-invasive prompt cache instrumentation for LLM API apps. Wrap your client in one line. Get terminal reports, JSON exports, and OTEL metrics.
Prompt caching gives steep discounts on cached tokens — but nothing tells you whether your app is actually getting cache hits, or why not. CacheLens wraps your Anthropic, Gemini, or OpenAI client and reports cache hit rate, cost, savings, and the money you're leaving on the table, broken down by prompt layer.
See docs/architecture.md for the full design.
Install
pip install cachelens # core + rich
pip install cachelens[anthropic] # + Anthropic SDK
pip install cachelens[gemini] # + Gemini SDK
pip install cachelens[openai] # + OpenAI SDK
pip install cachelens[otel] # + OpenTelemetry
pip install cachelens[all] # everything
Quickstart
import anthropic
from cache_lens import wrap
client = wrap(anthropic.Anthropic())
# ... use client exactly as before; report prints on exit
Explicit session boundary with exports:
from cache_lens import CacheLens
with CacheLens(client, json_export="report.json", otel=True) as session:
agent.run(...) # your code, unchanged
report = session.report
Suppress the terminal report in CI with CACHE_LENS_TERMINAL=0.
Custom pricing
CacheLens ships a default price table, but you can override or extend it without forking — handy when a new model lands. User entries merge over the defaults:
# in-memory dict (native format, USD per 1M tokens)
wrap(client, pricing={"openai": {"gpt-5": {"input": 1.25, "output": 10.0, "cache_read": 0.125}}})
# or a JSON file (native or LiteLLM model_prices_and_context_window.json format)
wrap(client, pricing="pricing.json")
Or point at a file process-wide with CACHE_LENS_PRICING=/path/to/pricing.json.
A bad pricing file falls back to defaults rather than breaking the run. See
docs/architecture.md for the full design.
How this helps you develop LLM applications
Prompt caching only pays off if your prompt's prefix is stable and byte-identical across calls — but most agent loops accumulate per-turn state (timestamps, counters, mutating progress trackers) that silently breaks the prefix without anyone noticing. The API still works fine, the bill just stays high. CacheLens turns that invisible problem into a concrete, iterable workflow during development:
- Wrap your client once and run your normal dev/test loop — no changes to your app logic required.
- Read the layer table to see whether your prompt is actually splitting
into stable layers (system prompt, schema/context, conversation) or
collapsing into one big
conversationblob — the latter is a strong signal that something near the top of your prompt changes every turn. - Use the tips as a diagnosis, not just a metric. "No stable prompt prefix detected" tells you why your hit rate is 0% and what to fix (move static content first, make it byte-identical); "X% of input tokens are uncached" tells you how much headroom restructuring is worth before you spend time on it.
- Re-run after each change and compare
Savings,Cached/Hit Rate, and whether the tips changed — this is the feedback loop that tells you whether a refactor (e.g. splitting prompt-building into a stable prefix and a volatile trailer) actually moved the needle, before you ever look at a billing dashboard.
See examples/queryargus.md for a real before/after walkthrough of this loop on a 30-turn Gemini agent.
Status
v1.0. Implemented: wrapper interception with request capture, provider
extraction + capture (Anthropic + Gemini + OpenAI), content-based layer
classification (longest-common-prefix → named system_prompt / context /
conversation layers, cross-referenced against actual cache reads),
terminal/JSON/OTEL outputs, overridable pricing, tests.
Pending: cache-lens run CLI injection, streaming support, and cross-run
static/semi-static separation (see docs/architecture.md).
Develop
pip install -e .[dev]
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachelens-1.0.5.tar.gz.
File metadata
- Download URL: cachelens-1.0.5.tar.gz
- Upload date:
- Size: 29.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb0abd3a41660b6994de9bcdd567dd8a6d3f40f9c63fe42713bcce5a66e557f4
|
|
| MD5 |
55d0cb26d7a6102d5bb05c56b5cb1ad5
|
|
| BLAKE2b-256 |
22865bf518cae5cb6d814313ea4fefd3e0b6dc162d2e715c2c89ede425c2a4df
|
Provenance
The following attestation bundles were made for cachelens-1.0.5.tar.gz:
Publisher:
release.yml on ChingEnLin/CacheLens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cachelens-1.0.5.tar.gz -
Subject digest:
bb0abd3a41660b6994de9bcdd567dd8a6d3f40f9c63fe42713bcce5a66e557f4 - Sigstore transparency entry: 1781738274
- Sigstore integration time:
-
Permalink:
ChingEnLin/CacheLens@6a7fa3bb43c0b78d09e61e0a2038e955f1ebc856 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ChingEnLin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6a7fa3bb43c0b78d09e61e0a2038e955f1ebc856 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cachelens-1.0.5-py3-none-any.whl.
File metadata
- Download URL: cachelens-1.0.5-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25836d99ca3db12b94775ee73fc0e7b459abccc02c34c3cb0387512a0e1fe5f6
|
|
| MD5 |
88e784a7748e53db9c7ef80c03d5d317
|
|
| BLAKE2b-256 |
e72b0e07183748d76c7f3f054f5d01cc0d8f60d59b8bed7cdffc01e1c3f3ed00
|
Provenance
The following attestation bundles were made for cachelens-1.0.5-py3-none-any.whl:
Publisher:
release.yml on ChingEnLin/CacheLens
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cachelens-1.0.5-py3-none-any.whl -
Subject digest:
25836d99ca3db12b94775ee73fc0e7b459abccc02c34c3cb0387512a0e1fe5f6 - Sigstore transparency entry: 1781738477
- Sigstore integration time:
-
Permalink:
ChingEnLin/CacheLens@6a7fa3bb43c0b78d09e61e0a2038e955f1ebc856 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ChingEnLin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6a7fa3bb43c0b78d09e61e0a2038e955f1ebc856 -
Trigger Event:
push
-
Statement type: