Skip to main content

Semantic similarity cache for LLM responses (Redis backend, TTL, cost tracking).

Project description

PromptCache

Reduce your LLM API costs by 30--70% with semantic caching.

PromptCache reuses LLM responses for semantically similar prompts, not just exact string matches.

If two users ask:

  • "Explain Redis in simple terms"

  • "Can you explain Redis simply?"

You shouldn't pay twice.

PromptCache makes sure you don't.


License


The Problem

If you're using OpenAI or any LLM API in production, you're likely paying repeatedly for:

  • The same question phrased differently

  • Similar support requests across users

  • Slight variations in prompts

  • Background job retries

  • RAG pipelines returning near-identical queries

Traditional caching only works for exact matches.

LLMs need semantic caching.


What PromptCache Does

  1. Embeds your prompt into a vector

  2. Searches Redis for similar past prompts

  3. If similarity ≥ threshold → returns cached response

  4. Otherwise → calls the LLM and stores the result

User Prompt
     
Embed  Redis Vector Search
     
Hit?  Return cached answer
Miss?  Call LLM  Store result

10-Second Example

from promptcache import SemanticCache
from promptcache.backends.redis_vector import RedisVectorBackend
from promptcache.embedders.openai import OpenAIEmbedder
from promptcache.types import CacheMeta

embedder = OpenAIEmbedder(model="text-embedding-3-small")

backend = RedisVectorBackend(
    url="redis://localhost:6379/0",
    dim=embedder.dim,
)

cache = SemanticCache(
    backend=backend,
    embedder=embedder,
    namespace="support-bot",
    threshold=0.92,
)

meta = CacheMeta(
    model="gpt-4.1-mini",
    system_prompt="You are a helpful support assistant.",
)

result = cache.get_or_set(
    prompt="How do I reset my password?",
    llm_call=my_llm_call,
    extract_text=lambda r: r.output_text,
    meta=meta,
)

print(result.cache_hit)  # True or False`

That's it.


Example Impact

In a SaaS support assistant:

  • 62% cache hit rate

  • 48% reduction in token usage

  • 44% reduction in API spend

Your mileage depends on workload --- but high-volume, repetitive systems benefit the most.


Production-Ready Design

PromptCache isolates cache entries by:

  • namespace

  • model

  • system_prompt

  • tools_schema

  • embedder

This prevents cross-context contamination.

Additional features:

  • ✅ Redis HNSW vector search (cosine similarity)

  • ✅ TTL support

  • ✅ Hit-rate statistics

  • ✅ Optional cost tracking

  • ✅ In-memory backend (for testing)

  • ✅ Framework-agnostic (no LangChain dependency)


Installation

pip install promptcache

Optional OpenAI embedder:

pip install promptcache[openai]

Redis Setup

PromptCache requires Redis Stack (RediSearch with vector support).

Run locally:

docker run -d\
  --name redis-stack\
  -p 6379:6379\
  redis/redis-stack:latest`

Verify:

redis-cli MODULE LIST

You should see:

search

Stats

Measure impact:

print(cache.stats())

Example:

{
    "hits": 1240,
    "misses": 860,
    "total": 2100,
    "hit_rate_percent": 59.05
}

When It Helps Most

  • Customer support bots

  • Internal copilots

  • FAQ systems

  • Knowledge assistants

  • Deterministic / low-temperature tasks

  • High-volume similar prompts


When It May Not Help

  • Highly personalized prompts

  • Creative high-temperature tasks

  • Frequently changing context


Testing

Run unit tests:

pytest

Run Redis integration tests:

export REDIS_URL="redis://localhost:6379/0"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcache_ai-0.1.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptcache_ai-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file promptcache_ai-0.1.0.tar.gz.

File metadata

  • Download URL: promptcache_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 938e867302d695d896b583cf605e3d1dbf8daa7f373231f1090815bac4f71681
MD5 fa9471dce3771c95907518d1c8fd8e9f
BLAKE2b-256 8687eb53f0557e1c1b6cf37b8f499317d200e3887ab8f793b4998fa3e6eafa41

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.1.0.tar.gz:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptcache_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: promptcache_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c49306ec78581192c3ea8b1505d648b715bd0912b620a6ba0444030d4737cbcf
MD5 a8cbe1249401370bb537d7667ae5c2ed
BLAKE2b-256 f2f212141054caaa3e6a576272b9fc961fcfc508a40d32d8d36dfa3b21435927

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.1.0-py3-none-any.whl:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page