Semantic LLM answer cache — reuse paraphrased queries, cut latency and token cost.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Aaronhans

These details have not been verified by PyPI

Project links

Demo Playground

Project description

cogcache

Semantic LLM answer cache — reuse paraphrased queries, cut latency and token cost.

cogcache caches LLM responses by semantic similarity instead of exact key match. When a paraphrased query arrives, it returns the previous answer in milliseconds — zero LLM tokens spent.

"What is semantic caching?"      → LLM call (4.2s, 320 tokens)
"What does semantic caching mean?" → Cache HIT (0.5ms, 0 tokens)   ← 99% savings

Install

pip install cogcache                   # core library
pip install cogcache[redis]            # + Redis Stack backend (HNSW vector search)
pip install cogcache[prometheus]       # + Prometheus metrics sink
pip install cogcache[openai-judge]     # + LLM-as-Judge quality scoring
pip install cogcache[langchain]        # + LangChain BaseCache adapter
pip install cogcache[all]              # everything

Quick start

from cogcache import CogniCache

cache = CogniCache(similarity_threshold=0.92)

def my_llm(query: str) -> str:
    # Your real LLM call here (OpenAI, Anthropic, DashScope, ...)
    return openai_client.chat.completions.create(...).choices[0].message.content

# First call → LLM
answer = cache.query("What is gradient descent?", llm_fn=my_llm)

# Second call → cache hit, zero LLM cost
answer = cache.query("Explain gradient descent.", llm_fn=my_llm)

As a decorator

@cache.cached(threshold=0.90)
def ask_llm(query: str) -> str:
    return my_llm(query)

ask_llm("What is X?")   # LLM call
ask_llm("Tell me X.")    # cache hit

With LangChain

from cogcache.integrations.langchain import CogniCacheLangChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(cache=CogniCacheLangChain(cache))

Features

Feature	Default
Cosine-similarity semantic matching	✅
Pluggable stores: `MemoryStore` / `RedisStore` (Redis Stack HNSW)	✅
TTL eviction on read & write paths	✅
LLM-as-Judge with "write strict, hit lenient" policy	optional
Prometheus + JSON metrics sink	optional
Route / intent isolation (multi-tenant safe)	✅
Fail-open on backend failures	✅

Configuration

CogniCache(
    redis_url="redis://localhost:6379/0",   # None = in-memory
    similarity_threshold=0.92,               # 0.85–0.95 typical
    max_cache_size=10_000,
    ttl=3600,                                # -1 for no expiry
    vector_dim=512,                          # match your embedder
    enable_judge=True,                       # LLM Judge quality gate
    write_min_quality=0.8,
    judge_on_hit=False,                      # async hit-time warning
    embed_fn=my_custom_embedder,             # or use the default
    metrics=MetricsCollector(),              # observability hook
)

See tuning guide for threshold selection, embedding model comparison, and Prometheus alert thresholds.

When to use cogcache

✅ High-QPS chatbots where users phrase the same question different ways ✅ RAG systems with repetitive paraphrased queries
✅ Multi-tenant LLM APIs where you bill per token
✅ Demo / dev environments where you want to skip LLM calls on repeat

❌ Personalized answers (use route=user_id isolation if you must)
❌ Real-time data (weather, prices) — set short TTL or skip caching

Production readiness

✅ Thread-safe MemoryStore and MetricsCollector
✅ Fail-open: Redis disconnect / Judge crash never breaks your request path
✅ 49 unit tests, run with pytest -q
✅ Used in production at AI_Cost_Optimization reference deployment

Try it live

For a complete demo with FastAPI backend, admin dashboard, Prometheus exporter, and Docker Compose setup, see the cogcache-playground repo:

git clone https://github.com/AaronharveyHan/cogcache-playground.git
cd cogcache-playground && docker compose up
# Open http://localhost:8000/admin

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Aaronhans

These details have not been verified by PyPI

Project links

Demo Playground

Release history Release notifications | RSS feed

This version

0.2.0

May 23, 2026

0.0.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogcache-0.2.0.tar.gz (41.4 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cogcache-0.2.0-py3-none-any.whl (35.0 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file cogcache-0.2.0.tar.gz.

File metadata

Download URL: cogcache-0.2.0.tar.gz
Upload date: May 23, 2026
Size: 41.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cogcache-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8b6c15be13f3312681db2e70206a26e01154b1067e7f9638af897841d50cb81e`
MD5	`dfe6b671785ce10443f55a4997679467`
BLAKE2b-256	`114e7d9466acf8564ab1cc8743bf2c8812021484511348a123698debea3f1c5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cogcache-0.2.0.tar.gz:

Publisher: release.yml on AaronharveyHan/AI_Cost_Optimization

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cogcache-0.2.0.tar.gz
- Subject digest: 8b6c15be13f3312681db2e70206a26e01154b1067e7f9638af897841d50cb81e
- Sigstore transparency entry: 1609989930
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: AaronharveyHan/AI_Cost_Optimization@c353db6f4e7182706ea6a62eb960db6f13f2a06c
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/AaronharveyHan
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c353db6f4e7182706ea6a62eb960db6f13f2a06c
- Trigger Event: push

File details

Details for the file cogcache-0.2.0-py3-none-any.whl.

File metadata

Download URL: cogcache-0.2.0-py3-none-any.whl
Upload date: May 23, 2026
Size: 35.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cogcache-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c0d4e5796969c0fbd4e506bc5ee250e6805e4ac85d0e6113826b7a62165b5b6`
MD5	`5fa5244ea63a4590bfdb63796fa83ae8`
BLAKE2b-256	`9d84f5d8c1d9b59c68c72d1309b9543ebb666c153aae763077ed8b7feda583e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cogcache-0.2.0-py3-none-any.whl:

Publisher: release.yml on AaronharveyHan/AI_Cost_Optimization

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cogcache-0.2.0-py3-none-any.whl
- Subject digest: 3c0d4e5796969c0fbd4e506bc5ee250e6805e4ac85d0e6113826b7a62165b5b6
- Sigstore transparency entry: 1609990098
- Sigstore integration time: May 23, 2026
Source repository:
- Permalink: AaronharveyHan/AI_Cost_Optimization@c353db6f4e7182706ea6a62eb960db6f13f2a06c
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/AaronharveyHan
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c353db6f4e7182706ea6a62eb960db6f13f2a06c
- Trigger Event: push

cogcache 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cogcache

Install

Quick start

As a decorator

With LangChain

Features

Configuration

When to use cogcache

Production readiness

Try it live

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance