Skip to main content

SQLite-backed LLM response cache. Exact match + fuzzy match. Decorator API. Zero mandatory server dependencies.

Project description

llm-cache

SQLite-backed LLM response cache. Exact match + fuzzy match. Decorator API. Zero mandatory server dependencies.

Tests Dependencies Python License LinkedIn


Why llm-cache?

Every LLM API call costs money and takes time. In development and testing, you hit the same prompts over and over. In production, users ask the same questions. llm-cache stores responses in a local SQLite database and serves them instantly — no Redis, no server, no external service.

Two match layers:

  • Exact match — SHA-256 hash of the normalised prompt. O(1) lookup.
  • Fuzzy match — token overlap similarity for near-identical prompts. Catches rephrasings without embeddings.

Install

pip install llm-cache

Quick start

from llm_cache import LLMCache

cache = LLMCache()  # stores in .llm_cache.db

# Decorator — wraps any LLM call function
@cache.cached(model="claude-opus-4-5")
def ask(prompt: str) -> str:
    return client.messages.create(
        model="claude-opus-4-5",
        messages=[{"role": "user", "content": prompt}],
    ).content[0].text

ask("What is RAG?")   # hits API first time
ask("What is RAG?")   # served from cache instantly — no API call
ask("What is rag?")   # fuzzy match — also served from cache

Direct API

from llm_cache import LLMCache, MatchType

cache = LLMCache("myapp.db", fuzzy_threshold=0.85, ttl_seconds=86400)

# Get
response, match = cache.get("What is RAG?")
if match == MatchType.MISS:
    response = call_api("What is RAG?")
    cache.set("What is RAG?", response, model="claude-opus-4-5", tags=["qa"])

# Match types
# MatchType.EXACT  — identical prompt
# MatchType.FUZZY  — near-match above threshold
# MatchType.MISS   — not cached

# Stats
print(cache.stats())
# CacheStats(entries=142, hit_rate=73%, tokens_saved≈48,200)

# Management
cache.delete(tags=["qa"])              # delete by tag
cache.delete(older_than_seconds=3600)  # expire old entries
cache.delete(model="claude-opus-4-5") # delete by model
cache.clear()                          # wipe everything

Configuration

cache = LLMCache(
    db_path=".llm_cache.db",    # SQLite file path
    fuzzy_threshold=0.85,        # 0-1, how similar prompts must be for fuzzy hit
    ttl_seconds=86400,           # auto-expire entries after N seconds (None = never)
    max_entries=10000,           # evict oldest when cache exceeds this size
)

Cost savings estimate

stats = cache.stats()
print(f"Tokens saved:  {stats.estimated_tokens_saved:,}")
print(f"Estimated saving: ~£{stats.cost_savings_estimate:.2f}")
print(f"Hit rate: {stats.hit_rate:.0%}")

Context manager

with LLMCache("session.db") as cache:
    result, _ = cache.get("my prompt")

Linda Oraegbunam | LinkedIn | Twitter | GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_response_cache-1.0.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_response_cache-1.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_response_cache-1.0.0.tar.gz.

File metadata

  • Download URL: llm_response_cache-1.0.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llm_response_cache-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3b7bafabf6c35a8ede1e2339a7b4efb6efd00cfa19770c0a87f96ec454840094
MD5 7cbad174b20ac68a888a1510588d380b
BLAKE2b-256 dd6fef83a1a8a21f2e069034ff14ef9b019a9e6d14d93461f869003ab0cbc2b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_response_cache-1.0.0.tar.gz:

Publisher: publish.yml on obielin/llm-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_response_cache-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_response_cache-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c74ef6e51051e0692ca4aae9bdb7cfb22e1bdc302e027b8825feed3cb2e43dc4
MD5 2b80f00f57dddc5efa77001042b943d7
BLAKE2b-256 e3289158949d1a392fe6a643494c74b45cda61d3366ae971e5d41a3dc57342bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_response_cache-1.0.0-py3-none-any.whl:

Publisher: publish.yml on obielin/llm-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page