Prevent LLM cache poisoning with Confidence Gap Analysis — intent-aware thresholds, zero false serves

These details have not been verified by PyPI

Project links

Repository

Project description

Cache Guard

Prevent LLM cache poisoning with Confidence Gap Analysis — intent-aware thresholds, zero false serves.

The Problem

"Reset my password" and "Reset my admin password" score 0.92 cosine similarity. A naive cache with a static threshold serves the wrong answer. This is cache poisoning — happening silently in production.

The Fix

Instead of trusting the top match score, look at the gap between the top two matches:

Large gap (top match far ahead) -> safe to serve from cache
Small gap (top two neck-and-neck) -> bypass to the LLM

Features

Confidence Gap Analysis — serve cached responses only when the match is unambiguous
Intent-aware thresholds — informational (loose), actionable (medium), transactional (strict)
TTL by risk tier — 30 / 14 / 7 days based on query intent
Model staleness check — reject entries generated by an older model version
Semantic drift revalidation — sample entries periodically to detect drift
Framework-agnostic — works with any LLM and any embedding model

Installation

pip install cache-guard

With optional dependencies:

pip install cache-guard[demo]    # Streamlit demo + Plotly charts
pip install cache-guard[dev]     # pytest
pip install cache-guard[all]     # Everything

Quick Start

from cache_guard import SafeCache

cache = SafeCache()

# Add a cached response
cache.add(
    query="How do I reset my password?",
    embedding=your_embed_fn("How do I reset my password?"),
    response="Go to Settings > Security > Reset.",
)

# Lookup with confidence gap analysis
result = cache.lookup(
    query="Reset my password",
    embedding=your_embed_fn("Reset my password"),
)

if result.hit:
    print(result.response)  # Serve from cache
else:
    print(result.reason)    # "gap 0.03 < min_gap 0.08 (ambiguous)"
    # -> call the LLM instead

With Any LLM + Embedding Model

Cache Guard manages the cache decision — you bring your own LLM and embeddings:

from openai import OpenAI
from cache_guard import SafeCache

client = OpenAI()
cache = SafeCache(model_version="gpt-4o-2024-08-06")

def embed(text):
    r = client.embeddings.create(model="text-embedding-3-small", input=text)
    return r.data[0].embedding

# Check cache first
result = cache.lookup(user_query, embed(user_query))

if result.hit:
    answer = result.response
else:
    answer = client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": user_query}]
    ).choices[0].message.content
    cache.add(user_query, embed(user_query), answer)

Intent-Aware Thresholds

Different query types carry different risk levels:

Intent	Threshold	Min Gap	TTL	Example
Informational	0.78	0.08	30d	"What are your hours?"
Actionable	0.90	0.15	14d	"Reset my password"
Transactional	0.95	0.20	7d	"Pay my bill"

from cache_guard import SafeCache, IntentClassifier

# Use built-in keyword heuristic
cache = SafeCache()  # auto-classifies queries

# Or plug in your own ML classifier
classifier = IntentClassifier(custom_fn=your_ml_model.predict)
cache = SafeCache(classifier=classifier)

Custom Thresholds

from cache_guard import SafeCache
from cache_guard.types import IntentConfig

cache = SafeCache(thresholds={
    "informational": IntentConfig(threshold=0.80, min_gap=0.10, ttl_days=30),
    "actionable":    IntentConfig(threshold=0.92, min_gap=0.18, ttl_days=14),
    "transactional": IntentConfig(threshold=0.97, min_gap=0.25, ttl_days=3),
})

Health Monitoring

stats = cache.get_stats()
print(f"Hit rate:    {stats['hit_rate']:.1%}")
print(f"Bypass rate: {stats['bypass_rate']:.1%}")
print(f"Entries:     {stats['entries']}")

# Periodic drift check
sample = cache.revalidate(sample_rate=0.05)
print(f"Sampled {sample['sampled']} of {sample['total']} entries")

Examples

See the examples/ folder for complete working demos:

Example	Description
`01_basic_usage.py`	Add entries, look them up, inspect stats
`02_poisoning_demo.py`	Naive cache vs SafeCache — poisoning comparison
`03_intent_thresholds.py`	Risk-based thresholds by query type
`04_streamlit_demo.py`	Interactive demo with live charts (DevConf talk)

Running Examples

git clone https://github.com/shrinidhi-mahishi/cache-guard.git
cd cache-guard
python -m venv venv && source venv/bin/activate
pip install -e ".[all]"

python examples/01_basic_usage.py
streamlit run examples/04_streamlit_demo.py

API Reference

SafeCache

Method	Description
`lookup(query, embedding)`	Confidence-gap lookup -> `LookupResult`
`add(query, embedding, response, model_version=None)`	Add entry -> `CacheEntry`
`revalidate(sample_rate=0.01)`	Sample entries for drift checking
`get_stats()`	Hit rate, bypass rate, entries count
`clear()`	Remove all entries and reset counters

IntentClassifier

Method	Description
`classify(query)`	Return intent tier (informational / actionable / transactional)

QuerySimulator

Method	Description
`get_queries(shuffle=True)`	Generate synthetic queries with poison injections
`get_cluster_centroids()`	Return embedding centroids for analysis

Configuration

cache = SafeCache(
    thresholds=DEFAULT_THRESHOLDS,  # Per-intent configs
    classifier=IntentClassifier(),  # Or custom ML classifier
    model_version="gpt-4o",        # For staleness checks
)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_cache_guard-0.1.0.tar.gz (13.2 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_cache_guard-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file llm_cache_guard-0.1.0.tar.gz.

File metadata

Download URL: llm_cache_guard-0.1.0.tar.gz
Upload date: Mar 7, 2026
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for llm_cache_guard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c4109d5d84b5a65e9d1d606d85945ec6dc8d39defe051ec8cf0d660355b16dae`
MD5	`d8c7f61e18559dc2c027aab324a72fc3`
BLAKE2b-256	`dcb1cd5ebb4cd7eb6987d7b1872a8421bbc0a73e85eb18aca54e7f8679f9dd7e`

See more details on using hashes here.

File details

Details for the file llm_cache_guard-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_cache_guard-0.1.0-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for llm_cache_guard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3e5e32e22a44da1eeadb5bb542af189a4a275b114d870819b48886ef6db3414`
MD5	`89cdbe484a9a77955fdbd04b11527d7a`
BLAKE2b-256	`01db0eb01404e4e47eefc9228d3ee1aea3999dae78c6b7b83356acebfe002f61`

See more details on using hashes here.

llm-cache-guard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cache Guard

The Problem

The Fix

Features

Installation

Quick Start

With Any LLM + Embedding Model

Intent-Aware Thresholds

Custom Thresholds

Health Monitoring

Examples

Running Examples

API Reference

SafeCache

IntentClassifier

QuerySimulator

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes