Non-invasive prompt cache instrumentation for LLM API apps
Project description
cache-lens
Non-invasive prompt cache instrumentation for LLM API apps. Wrap your client in one line. Get terminal reports, JSON exports, and OTEL metrics.
Prompt caching gives steep discounts on cached tokens — but nothing tells you whether your app is actually getting cache hits, or why not. cache-lens wraps your Anthropic, Gemini, or OpenAI client and reports cache hit rate, cost, savings, and the money you're leaving on the table, broken down by prompt layer.
See SPEC.md for the full design.
Install
pip install cache-lens # core + rich
pip install cache-lens[anthropic] # + Anthropic SDK
pip install cache-lens[gemini] # + Gemini SDK
pip install cache-lens[openai] # + OpenAI SDK
pip install cache-lens[otel] # + OpenTelemetry
pip install cache-lens[all] # everything
Quickstart
import anthropic
from cache_lens import wrap
client = wrap(anthropic.Anthropic())
# ... use client exactly as before; report prints on exit
Explicit session boundary with exports:
from cache_lens import CacheLens
with CacheLens(client, json_export="report.json", otel=True) as session:
agent.run(...) # your code, unchanged
report = session.report
Suppress the terminal report in CI with CACHE_LENS_TERMINAL=0.
Custom pricing
cache-lens ships a default price table, but you can override or extend it without forking — handy when a new model lands. User entries merge over the defaults:
# in-memory dict (native format, USD per 1M tokens)
wrap(client, pricing={"openai": {"gpt-5": {"input": 1.25, "output": 10.0, "cache_read": 0.125}}})
# or a JSON file (native or LiteLLM model_prices_and_context_window.json format)
wrap(client, pricing="pricing.json")
Or point at a file process-wide with CACHE_LENS_PRICING=/path/to/pricing.json.
A bad pricing file falls back to defaults rather than breaking the run. See
SPEC.md §12.
Status
v1.0. Implemented: wrapper interception with request capture, provider
extraction + capture (Anthropic + Gemini + OpenAI), content-based layer
classification (longest-common-prefix → named system_prompt / context /
conversation layers, cross-referenced against actual cache reads),
terminal/JSON/OTEL outputs, overridable pricing, tests.
Pending: cache-lens run CLI injection, streaming support, and cross-run
static/semi-static separation (see docs/architecture.md).
Develop
pip install -e .[dev]
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachelens-1.0.0.tar.gz.
File metadata
- Download URL: cachelens-1.0.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
472c3cd818fcf1e6325559689c21becc39ff252648c03a334f3f5044f5fa2acd
|
|
| MD5 |
2b3db3a7e3a3ddb7c31bde15dfa23606
|
|
| BLAKE2b-256 |
11a2cc167acfc5cabef82a02ffaed50269cf81019e9cabffeaf47d99332ea1ee
|
File details
Details for the file cachelens-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cachelens-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbec23375ce270416605363c59e627120fa11271a29ff7410d46060adc574e85
|
|
| MD5 |
790d212aff8a9efef888b038e4b2d36c
|
|
| BLAKE2b-256 |
3ddbfeb2c3ee8f754d81f151cee74f11d4130c213cddda75a8861c39d957e1f5
|