Skip to main content

Semantic similarity cache for LLM responses (Redis backend, TTL, cost tracking).

Project description

PromptCache

Reduce your LLM API costs by 30--70% with semantic caching.

PromptCache reuses LLM responses for semantically similar prompts, not just exact string matches.

If two users ask:

  • "Explain Redis in simple terms"

  • "Can you explain Redis simply?"

You shouldn't pay twice.

PromptCache makes sure you don't.


License


The Problem

If you're using OpenAI or any LLM API in production, you're likely paying repeatedly for:

  • The same question phrased differently

  • Similar support requests across users

  • Slight variations in prompts

  • Background job retries

  • RAG pipelines returning near-identical queries

Traditional caching only works for exact matches.

LLMs need semantic caching.


What PromptCache Does

  1. Embeds your prompt into a vector

  2. Searches Redis for similar past prompts

  3. If similarity ≥ threshold → returns cached response

  4. Otherwise → calls the LLM and stores the result

User Prompt
     
Embed  Redis Vector Search
     
Hit?  Return cached answer
Miss?  Call LLM  Store result

10-Second Example

from promptcache import SemanticCache
from promptcache.backends.redis_vector import RedisVectorBackend
from promptcache.embedders.openai import OpenAIEmbedder
from promptcache.types import CacheMeta

embedder = OpenAIEmbedder(model="text-embedding-3-small")

backend = RedisVectorBackend(
    url="redis://localhost:6379/0",
    dim=embedder.dim,
)

cache = SemanticCache(
    backend=backend,
    embedder=embedder,
    namespace="support-bot",
    threshold=0.92,
)

meta = CacheMeta(
    model="gpt-4.1-mini",
    system_prompt="You are a helpful support assistant.",
)

result = cache.get_or_set(
    prompt="How do I reset my password?",
    llm_call=my_llm_call,
    extract_text=lambda r: r.output_text,
    meta=meta,
)

print(result.cache_hit)  # True or False`

That's it.


Example Impact

In a SaaS support assistant:

  • 62% cache hit rate

  • 48% reduction in token usage

  • 44% reduction in API spend

Your mileage depends on workload --- but high-volume, repetitive systems benefit the most.


Production-Ready Design

PromptCache isolates cache entries by:

  • namespace

  • model

  • system_prompt

  • tools_schema

  • embedder

This prevents cross-context contamination.

Additional features:

  • ✅ Redis HNSW vector search (cosine similarity)

  • ✅ TTL support

  • ✅ Hit-rate statistics

  • ✅ Optional cost tracking

  • ✅ In-memory backend (for testing)

  • ✅ Framework-agnostic (no LangChain dependency)


Installation

pip install promptcache-ai

Optional OpenAI embedder:

pip install promptcache-ai[openai]

Redis Setup

PromptCache requires Redis Stack (RediSearch with vector support).

Run locally:

docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest

Verify:

redis-cli MODULE LIST

You should see:

search

Stats

Measure impact:

print(cache.stats())

Example:

{
    "hits": 1240,
    "misses": 860,
    "total": 2100,
    "hit_rate_percent": 59.05
}

When It Helps Most

  • Customer support bots

  • Internal copilots

  • FAQ systems

  • Knowledge assistants

  • Deterministic / low-temperature tasks

  • High-volume similar prompts


When It May Not Help

  • Highly personalized prompts

  • Creative high-temperature tasks

  • Frequently changing context


Testing

Run unit tests:

pytest

Run Redis integration tests:

export REDIS_URL="redis://localhost:6379/0"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcache_ai-0.2.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptcache_ai-0.2.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file promptcache_ai-0.2.0.tar.gz.

File metadata

  • Download URL: promptcache_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a38431d4230b80b36f44cfd34edeef91ec134c0ed5505cea8daaecb4d5358ed5
MD5 37c009d58a37a986efe5cdfb690fd7db
BLAKE2b-256 72bc960b25151b2abd64bb5c5a86e2a43ba59da1787b13308547c2b677876019

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.2.0.tar.gz:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptcache_ai-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: promptcache_ai-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcache_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07e3306d592b051eba5ec8cd5ea48b7835d4b822a880738e08b5c9b1973b4d2f
MD5 0c290ee516eb1fd837bed25dc59566ad
BLAKE2b-256 28322c4a896fc06c04b10e0defa9c70fbed828e083400fe598399d2f61165b85

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcache_ai-0.2.0-py3-none-any.whl:

Publisher: ci.yml on tase-nikol/promptcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page