FAISS-backed hybrid semantic caching for LLM apps: text normalization, similarity lookup, LRU eviction, and disk persistence.
Project description
Hybrid Semantic Cache
FAISS-backed semantic caching middleware for LLM applications: normalize noisy user queries, retrieve semantically similar cached answers locally, and only call the cloud LLM on a real miss — cutting latency and API cost.
user query ──▶ normalize ──▶ semantic lookup (FAISS) ──┬─▶ HIT → cached answer (ms)
└─▶ MISS → your LLM → cache.add()
Features
- Semantic lookup — cosine similarity over sentence-transformer embeddings
(FAISS
IndexFlatIP/IndexIDMap2), with a configurable threshold. - LRU eviction — hard
max_recordscapacity; least-recently-used entries are evicted from both the vector index and the metadata store. - Disk persistence — pass
persist_dirand the cache survives restarts. - Pluggable embeddings — inject your own
encode_fn(custom models, prefixing rules such as e5's"query: ", GPU batching) or let the library lazily load a SentenceTransformer for you. - Indonesian text normalization —
normalize_text()rewrites slang/typos ("gmn cr ganti pw?") into standard text before embedding, raising hit rates. - Hit tracking — per-entry
hitscounter andlast_accessedtimestamps.
Installation
pip install hybrid-semantic-cache
# with the FastAPI + Gemini demo app:
pip install "hybrid-semantic-cache[demo]"
Requires Python 3.9+.
Quick Start
from hybrid_semantic_cache import HybridSemanticCache, normalize_text
cache = HybridSemanticCache(
threshold=0.80, # min cosine similarity for a hit
max_records=1000, # LRU eviction beyond this
persist_dir="./cache", # optional: survive restarts
)
query = normalize_text("gmn cr ganti pw email yak?")
hit = cache.search(query)
if hit is not None:
print(f"cache hit ({hit.score:.2f}):", hit.response)
else:
answer = call_your_llm(query) # any provider
cache.add(query, answer)
Custom embeddings (e.g. multilingual-e5)
E5-family models need a "query: " prefix and benefit from normalization —
inject your own encoder and the library never loads a second model:
import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("intfloat/multilingual-e5-base")
def encode(text: str) -> np.ndarray:
return model.encode(f"query: {text}", normalize_embeddings=True)
cache = HybridSemanticCache(encode_fn=encode, dimension=768, threshold=0.92)
Metadata, stats, and management
entry_id = cache.add(prompt, response, metadata={"sources": [...], "agent": "rag"})
hit = cache.search(prompt) # hit.metadata, hit.hits, hit.score
cache.stats() # {'records': ..., 'capacity': ..., 'total_hits': ...}
cache.remove(entry_id)
cache.clear()
len(cache)
Low-level building blocks
from hybrid_semantic_cache import VectorStore, TextEmbedder
store = VectorStore(dimension=384)
store.add_to_index(vectors, [{"question": q, "answer": a}, ...])
score, meta = store.search(query_vector)
store.save("cache.index", "metadata.json")
store = VectorStore.load("cache.index", "metadata.json")
Demo app
A self-contained FastAPI service showing the full normalize → cache → LLM
fallback flow (uses Gemini when GEMINI_API_KEY is set, a stub otherwise):
pip install "hybrid-semantic-cache[demo]"
uvicorn hybrid_semantic_cache.main:app
# POST {"message": "..."} to http://127.0.0.1:8000/chat
Development
git clone https://github.com/shencell/hybrid-semantic-cache
cd hybrid-semantic-cache
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hybrid_semantic_cache-0.2.0.tar.gz.
File metadata
- Download URL: hybrid_semantic_cache-0.2.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a31ae8bbcfc084d15619e47a3c1c3a2d7afcae79608864fb69d6025e9c654ab0
|
|
| MD5 |
c1e595506df3207a31930d2ad0d71b40
|
|
| BLAKE2b-256 |
7fcee489bea8871cfd78c826422e2c608cae282559b4854ab52a13594b89b01b
|
Provenance
The following attestation bundles were made for hybrid_semantic_cache-0.2.0.tar.gz:
Publisher:
publish.yml on shencell/hybrid-semantic-cache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hybrid_semantic_cache-0.2.0.tar.gz -
Subject digest:
a31ae8bbcfc084d15619e47a3c1c3a2d7afcae79608864fb69d6025e9c654ab0 - Sigstore transparency entry: 1798850431
- Sigstore integration time:
-
Permalink:
shencell/hybrid-semantic-cache@89f558eab97aabee4ceccb3883852aa6a22a155a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/shencell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@89f558eab97aabee4ceccb3883852aa6a22a155a -
Trigger Event:
push
-
Statement type:
File details
Details for the file hybrid_semantic_cache-0.2.0-py3-none-any.whl.
File metadata
- Download URL: hybrid_semantic_cache-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70f38544141816ecfa029e23207f635ec7894030abd3cfbb339e184688dc38fc
|
|
| MD5 |
8b145c15b4e7a1c91dcc376170044bd9
|
|
| BLAKE2b-256 |
9b0e48cfa906ac51c6e4658934910f7ac1c0ad8185917f0bea7c72f84b5f9eeb
|
Provenance
The following attestation bundles were made for hybrid_semantic_cache-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on shencell/hybrid-semantic-cache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hybrid_semantic_cache-0.2.0-py3-none-any.whl -
Subject digest:
70f38544141816ecfa029e23207f635ec7894030abd3cfbb339e184688dc38fc - Sigstore transparency entry: 1798850625
- Sigstore integration time:
-
Permalink:
shencell/hybrid-semantic-cache@89f558eab97aabee4ceccb3883852aa6a22a155a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/shencell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@89f558eab97aabee4ceccb3883852aa6a22a155a -
Trigger Event:
push
-
Statement type: