Semantic similarity cache for LLM responses (Redis backend, TTL, cost tracking).
Project description
PromptCache
Reduce your LLM API costs by 30--70% with semantic caching.
PromptCache reuses LLM responses for semantically similar prompts, not just exact string matches.
If two users ask:
-
"Explain Redis in simple terms"
-
"Can you explain Redis simply?"
You shouldn't pay twice.
PromptCache makes sure you don't.
The Problem
If you're using OpenAI or any LLM API in production, you're likely paying repeatedly for:
-
The same question phrased differently
-
Similar support requests across users
-
Slight variations in prompts
-
Background job retries
-
RAG pipelines returning near-identical queries
Traditional caching only works for exact matches.
LLMs need semantic caching.
What PromptCache Does
-
Embeds your prompt into a vector
-
Searches Redis for similar past prompts
-
If similarity ≥ threshold → returns cached response
-
Otherwise → calls the LLM and stores the result
User Prompt
↓
Embed → Redis Vector Search
↓
Hit? → Return cached answer
Miss? → Call LLM → Store result
10-Second Example
from promptcache import SemanticCache
from promptcache.backends.redis_vector import RedisVectorBackend
from promptcache.embedders.openai import OpenAIEmbedder
from promptcache.types import CacheMeta
embedder = OpenAIEmbedder(model="text-embedding-3-small")
backend = RedisVectorBackend(
url="redis://localhost:6379/0",
dim=embedder.dim,
)
cache = SemanticCache(
backend=backend,
embedder=embedder,
namespace="support-bot",
threshold=0.92,
)
meta = CacheMeta(
model="gpt-4.1-mini",
system_prompt="You are a helpful support assistant.",
)
result = cache.get_or_set(
prompt="How do I reset my password?",
llm_call=my_llm_call,
extract_text=lambda r: r.output_text,
meta=meta,
)
print(result.cache_hit) # True or False`
That's it.
Example Impact
In a SaaS support assistant:
-
62% cache hit rate
-
48% reduction in token usage
-
44% reduction in API spend
Your mileage depends on workload --- but high-volume, repetitive systems benefit the most.
Production-Ready Design
PromptCache isolates cache entries by:
-
namespace -
model -
system_prompt -
tools_schema -
embedder
This prevents cross-context contamination.
Additional features:
-
✅ Redis HNSW vector search (cosine similarity)
-
✅ TTL support
-
✅ Hit-rate statistics
-
✅ Optional cost tracking
-
✅ In-memory backend (for testing)
-
✅ Framework-agnostic (no LangChain dependency)
Installation
pip install promptcache
Optional OpenAI embedder:
pip install promptcache[openai]
Redis Setup
PromptCache requires Redis Stack (RediSearch with vector support).
Run locally:
docker run -d\
--name redis-stack\
-p 6379:6379\
redis/redis-stack:latest`
Verify:
redis-cli MODULE LIST
You should see:
search
Stats
Measure impact:
print(cache.stats())
Example:
{
"hits": 1240,
"misses": 860,
"total": 2100,
"hit_rate_percent": 59.05
}
When It Helps Most
-
Customer support bots
-
Internal copilots
-
FAQ systems
-
Knowledge assistants
-
Deterministic / low-temperature tasks
-
High-volume similar prompts
When It May Not Help
-
Highly personalized prompts
-
Creative high-temperature tasks
-
Frequently changing context
Testing
Run unit tests:
pytest
Run Redis integration tests:
export REDIS_URL="redis://localhost:6379/0"
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptcache_ai-0.1.0.tar.gz.
File metadata
- Download URL: promptcache_ai-0.1.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
938e867302d695d896b583cf605e3d1dbf8daa7f373231f1090815bac4f71681
|
|
| MD5 |
fa9471dce3771c95907518d1c8fd8e9f
|
|
| BLAKE2b-256 |
8687eb53f0557e1c1b6cf37b8f499317d200e3887ab8f793b4998fa3e6eafa41
|
Provenance
The following attestation bundles were made for promptcache_ai-0.1.0.tar.gz:
Publisher:
ci.yml on tase-nikol/promptcache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptcache_ai-0.1.0.tar.gz -
Subject digest:
938e867302d695d896b583cf605e3d1dbf8daa7f373231f1090815bac4f71681 - Sigstore transparency entry: 973266120
- Sigstore integration time:
-
Permalink:
tase-nikol/promptcache@708c3d5a59ccd8cf8135893eef3908022c1aca58 -
Branch / Tag:
refs/tags/v.0.1.3 - Owner: https://github.com/tase-nikol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@708c3d5a59ccd8cf8135893eef3908022c1aca58 -
Trigger Event:
push
-
Statement type:
File details
Details for the file promptcache_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: promptcache_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c49306ec78581192c3ea8b1505d648b715bd0912b620a6ba0444030d4737cbcf
|
|
| MD5 |
a8cbe1249401370bb537d7667ae5c2ed
|
|
| BLAKE2b-256 |
f2f212141054caaa3e6a576272b9fc961fcfc508a40d32d8d36dfa3b21435927
|
Provenance
The following attestation bundles were made for promptcache_ai-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on tase-nikol/promptcache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptcache_ai-0.1.0-py3-none-any.whl -
Subject digest:
c49306ec78581192c3ea8b1505d648b715bd0912b620a6ba0444030d4737cbcf - Sigstore transparency entry: 973266146
- Sigstore integration time:
-
Permalink:
tase-nikol/promptcache@708c3d5a59ccd8cf8135893eef3908022c1aca58 -
Branch / Tag:
refs/tags/v.0.1.3 - Owner: https://github.com/tase-nikol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@708c3d5a59ccd8cf8135893eef3908022c1aca58 -
Trigger Event:
push
-
Statement type: