Local SQLite cache for OpenAI and Anthropic API calls. One env var, 60-80% cheaper dev loops.
Project description
llm-cache-proxy
One env var. 60–80% cheaper dev loops. A localhost proxy that caches identical OpenAI and Anthropic API calls on disk and replays them for free. Works with every existing tool.
Why
You're iterating on a prompt. You run the same call 40 times tweaking wording. That's 40× the spend on identical requests.
Or: you have a long-running script that re-fetches the same tool definitions every run during development. Or: your test suite calls the API.
llm-cache-proxy sits on localhost, speaks the OpenAI and Anthropic REST
protocols, and caches every successful response in a single SQLite file.
Identical requests (same method, path, body) get served from disk —
no network call, no spend.
It works with every tool because you only change one env var:
export OPENAI_BASE_URL=http://127.0.0.1:9001/openai/v1
export ANTHROPIC_BASE_URL=http://127.0.0.1:9001/anthropic
Cursor, Claude Code, your scripts, your tests, the OpenAI Python SDK, the Anthropic SDK — they all start using the cache automatically.
Install & run
pip install llm-cache-proxy
llm-cache-proxy
# or
uvx llm-cache-proxy
Default port: 9001. Default cache: ~/.cache/llm-cache-proxy/cache.db.
Use it
In whatever shell launches your tool / script:
# OpenAI
export OPENAI_BASE_URL=http://127.0.0.1:9001/openai/v1
# Anthropic
export ANTHROPIC_BASE_URL=http://127.0.0.1:9001/anthropic
# now run anything — Cursor, your script, pytest, etc.
Responses include a X-LLM-Cache: HIT|MISS header so you can see what
happened.
Bypass the cache for a single request:
curl -H "x-llm-cache-bypass: 1" http://127.0.0.1:9001/openai/v1/chat/completions ...
See what you saved
curl http://127.0.0.1:9001/stats
{
"hits": 312,
"misses": 87,
"bytes_served_from_cache": 4_182_404,
"entries": 87,
"cached_response_bytes": 1_205_211,
"by_model": {"gpt-4o": 41, "claude-sonnet-4": 46}
}
Clear the cache:
curl -X DELETE http://127.0.0.1:9001/cache
curl -X DELETE http://127.0.0.1:9001/stats
Config (all optional)
| Env var | Default | Description |
|---|---|---|
LLM_CACHE_PORT |
9001 |
Listen port. |
LLM_CACHE_HOST |
127.0.0.1 |
Listen host. |
LLM_CACHE_DIR |
~/.cache/llm-cache-proxy |
Where to put the SQLite file. |
LLM_CACHE_TTL |
0 |
TTL in seconds (0 = forever). |
LLM_CACHE_TIMEOUT |
300 |
Upstream request timeout. |
OPENAI_UPSTREAM |
https://api.openai.com |
Override the upstream. |
ANTHROPIC_UPSTREAM |
https://api.anthropic.com |
Override the upstream. |
Per-request:
- Header
x-llm-cache-bypass: 1— skip both read and write for this call. - Header
x-llm-cache-extra-key: <string>— add an extra dimension to the cache key (e.g., a user id, a session id).
How the cache key is built
sha256(method + "|" + path + "|" + body + optional extra_key)
Method + path + body is enough to make identical requests collide
deterministically. Headers are not included in the default key (so API
key rotation doesn't invalidate the cache) — set x-llm-cache-extra-key
if you want extra dimensions.
Only 2xx responses are cached. Errors always go through.
Caveats
- Streaming responses: when the upstream returns
text/event-stream, the full SSE body is captured and replayed verbatim on cache hit. That works but you lose per-token streaming feel. - Tool / function-calling responses cache fine — the whole completion object is one entry.
- Don't expose this proxy to the public internet — it has no auth and your API key flows through it.
Companion projects
- mcp-rec — VCR for MCP servers (similar idea, MCP layer).
- ai-first-scraper — clean Markdown for LLM agents.
License
MIT © yubinkim444
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_cache_proxy-0.1.0.tar.gz.
File metadata
- Download URL: llm_cache_proxy-0.1.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc9417f6d853c3c89d83b96bec67fcb52801b41ed1cef33fbb15fbee455bedfc
|
|
| MD5 |
42919973d6e7389a76af8099af3cc12b
|
|
| BLAKE2b-256 |
ff47d74a155a50abacf7de4332542ad1d6a41d7afe077eec2afdd6e380f68245
|
File details
Details for the file llm_cache_proxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_cache_proxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0379bceffcc7522075bc00fa470081cbef7c02d4a222044b71f9adcc44afa8b1
|
|
| MD5 |
6321f5474c63e28778d142fbd2c956fd
|
|
| BLAKE2b-256 |
e8b05b7f4e1932f2fc64a1c82adac07576ef207015e590353592be92b70aad03
|