Skip to main content

Non-invasive prompt cache instrumentation for LLM API apps

Project description

CacheLens

Non-invasive prompt cache instrumentation for LLM API apps. Wrap your client in one line. Get terminal reports, JSON exports, and OTEL metrics.

Prompt caching gives steep discounts on cached tokens — but nothing tells you whether your app is actually getting cache hits, or why not. CacheLens wraps your Anthropic, Gemini, or OpenAI client and reports cache hit rate, cost, savings, and the money you're leaving on the table, broken down by prompt layer.

See docs/architecture.md for the full design.

Install

pip install cachelens                # core + rich
pip install cachelens[anthropic]     # + Anthropic SDK
pip install cachelens[gemini]        # + Gemini SDK
pip install cachelens[openai]        # + OpenAI SDK
pip install cachelens[otel]          # + OpenTelemetry
pip install cachelens[all]           # everything

Quickstart

import anthropic
from cache_lens import wrap

client = wrap(anthropic.Anthropic())
# ... use client exactly as before; report prints on exit

Explicit session boundary with exports:

from cache_lens import CacheLens

with CacheLens(client, json_export="report.json", otel=True) as session:
    agent.run(...)        # your code, unchanged
report = session.report

Suppress the terminal report in CI with CACHE_LENS_TERMINAL=0.

Custom pricing

CacheLens ships a default price table, but you can override or extend it without forking — handy when a new model lands. User entries merge over the defaults:

# in-memory dict (native format, USD per 1M tokens)
wrap(client, pricing={"openai": {"gpt-5": {"input": 1.25, "output": 10.0, "cache_read": 0.125}}})

# or a JSON file (native or LiteLLM model_prices_and_context_window.json format)
wrap(client, pricing="pricing.json")

Or point at a file process-wide with CACHE_LENS_PRICING=/path/to/pricing.json. A bad pricing file falls back to defaults rather than breaking the run. See docs/architecture.md for the full design.

How this helps you develop LLM applications

Prompt caching only pays off if your prompt's prefix is stable and byte-identical across calls — but most agent loops accumulate per-turn state (timestamps, counters, mutating progress trackers) that silently breaks the prefix without anyone noticing. The API still works fine, the bill just stays high. CacheLens turns that invisible problem into a concrete, iterable workflow during development:

  1. Wrap your client once and run your normal dev/test loop — no changes to your app logic required.
  2. Read the layer table to see whether your prompt is actually splitting into stable layers (system prompt, schema/context, conversation) or collapsing into one big conversation blob — the latter is a strong signal that something near the top of your prompt changes every turn.
  3. Use the tips as a diagnosis, not just a metric. "No stable prompt prefix detected" tells you why your hit rate is 0% and what to fix (move static content first, make it byte-identical); "X% of input tokens are uncached" tells you how much headroom restructuring is worth before you spend time on it.
  4. Re-run after each change and compare Savings, Cached/Hit Rate, and whether the tips changed — this is the feedback loop that tells you whether a refactor (e.g. splitting prompt-building into a stable prefix and a volatile trailer) actually moved the needle, before you ever look at a billing dashboard.

See examples/queryargus.md for a real before/after walkthrough of this loop on a 30-turn Gemini agent.

Status

v1.0. Implemented: wrapper interception with request capture, provider extraction + capture (Anthropic + Gemini + OpenAI), content-based layer classification (longest-common-prefix → named system_prompt / context / conversation layers, cross-referenced against actual cache reads), terminal/JSON/OTEL outputs, overridable pricing, tests. Pending: cache-lens run CLI injection, streaming support, and cross-run static/semi-static separation (see docs/architecture.md).

Develop

pip install -e .[dev]
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachelens-1.0.4.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachelens-1.0.4-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file cachelens-1.0.4.tar.gz.

File metadata

  • Download URL: cachelens-1.0.4.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for cachelens-1.0.4.tar.gz
Algorithm Hash digest
SHA256 817e7c71de0ce8173f0806130f09a41639dec9cec99ef6bf76b2024ae84a578e
MD5 88bed2bcfb87c4d2b89ddc06fa8b55b0
BLAKE2b-256 0cab95cffa7122212e74dc4915eaa17703b0b5168427bdebd05dd93a52078af1

See more details on using hashes here.

Provenance

The following attestation bundles were made for cachelens-1.0.4.tar.gz:

Publisher: release.yml on ChingEnLin/CacheLens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cachelens-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: cachelens-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for cachelens-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ab585ec0368a5f6558d19b9e15237560d0856b38dfe23c44c942366f82cb52c9
MD5 9d8f529ffb5df57da53129a67b5c7220
BLAKE2b-256 16d14507f92cbfc6ab59d41acdbc2ef69ca07c8ea1985055a241b35b31bbfeb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cachelens-1.0.4-py3-none-any.whl:

Publisher: release.yml on ChingEnLin/CacheLens

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page