Skip to main content

Semantic similarity cache for LLM responses (Redis backend, TTL, cost tracking).

Project description

PromptCache

Reduce your LLM API costs by 30--70% with semantic caching.

PromptCache reuses LLM responses for semantically similar prompts, not just exact string matches.

If two users ask:

  • "Explain Redis in simple terms"

  • "Can you explain Redis simply?"

You shouldn't pay twice.

PromptCache makes sure you don't.


License


The Problem

If you're using OpenAI or any LLM API in production, you're likely paying repeatedly for:

  • The same question phrased differently

  • Similar support requests across users

  • Slight variations in prompts

  • Background job retries

  • RAG pipelines returning near-identical queries

Traditional caching only works for exact matches.

LLMs need semantic caching.


What PromptCache Does

  1. Embeds your prompt into a vector

  2. Searches Redis for similar past prompts

  3. If similarity ≥ threshold → returns cached response

  4. Otherwise → calls the LLM and stores the result

User Prompt
     
Embed  Redis Vector Search
     
Hit?  Return cached answer
Miss?  Call LLM  Store result

10-Second Example

from promptcache import SemanticCache
from promptcache.backends.redis_vector import RedisVectorBackend
from promptcache.embedders.openai import OpenAIEmbedder
from promptcache.types import CacheMeta

embedder = OpenAIEmbedder(model="text-embedding-3-small")

backend = RedisVectorBackend(
    url="redis://localhost:6379/0",
    dim=embedder.dim,
)

cache = SemanticCache(
    backend=backend,
    embedder=embedder,
    namespace="support-bot",
    threshold=0.92,
)

meta = CacheMeta(
    model="gpt-4.1-mini",
    system_prompt="You are a helpful support assistant.",
)

result = cache.get_or_set(
    prompt="How do I reset my password?",
    llm_call=my_llm_call,
    extract_text=lambda r: r.output_text,
    meta=meta,
)

print(result.cache_hit)  # True or False`

That's it.


Example Impact

In a SaaS support assistant:

  • 62% cache hit rate

  • 48% reduction in token usage

  • 44% reduction in API spend

Your mileage depends on workload --- but high-volume, repetitive systems benefit the most.


Production-Ready Design

PromptCache isolates cache entries by:

  • namespace

  • model

  • system_prompt

  • tools_schema

  • embedder

This prevents cross-context contamination.

Additional features:

  • ✅ Redis HNSW vector search (cosine similarity)

  • ✅ TTL support

  • ✅ Hit-rate statistics

  • ✅ Optional cost tracking

  • ✅ In-memory backend (for testing)

  • ✅ Framework-agnostic (no LangChain dependency)


Installation

pip install promptcache-ai

Optional OpenAI embedder:

pip install promptcache-ai[openai]

Redis Setup

PromptCache requires Redis Stack (RediSearch with vector support).

Run locally:

docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest

Verify:

redis-cli MODULE LIST

You should see:

search

Stats

Measure impact:

print(cache.stats())

Example:

{
    "hits": 1240,
    "misses": 860,
    "total": 2100,
    "hit_rate_percent": 59.05
}

When It Helps Most

  • Customer support bots

  • Internal copilots

  • FAQ systems

  • Knowledge assistants

  • Deterministic / low-temperature tasks

  • High-volume similar prompts


When It May Not Help

  • Highly personalized prompts

  • Creative high-temperature tasks

  • Frequently changing context


Testing

Run unit tests:

pytest

Run Redis integration tests:

export REDIS_URL="redis://localhost:6379/0"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcache_ai-0.1.5.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptcache_ai-0.1.5-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file promptcache_ai-0.1.5.tar.gz.

File metadata

  • Download URL: promptcache_ai-0.1.5.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.1.5.tar.gz
Algorithm Hash digest
SHA256 d14d3f4c39a4eea4163fcfe4ab32d4ce32ec8e688443e51e14540e65efc2175f
MD5 526e38d0ec57160b87b67dde1f636c80
BLAKE2b-256 69fa39610388e11347769111d115d9e361a115263b7901eecb66876e3ad92ead

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.1.5.tar.gz:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptcache_ai-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: promptcache_ai-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b6644b13b126b36a82019fa6148a2c45b03abc8971c9c15b2c3d2900ce34a77f
MD5 1126c9af63de57fd922365c01ecb04cd
BLAKE2b-256 cfa3f5472fa88e5b795a0dc682b58adf75593ff5e54d87f6da0da411d161eb0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.1.5-py3-none-any.whl:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page