Skip to main content

FAISS-backed hybrid semantic caching for LLM apps: text normalization, similarity lookup, LRU eviction, and disk persistence.

Project description

Hybrid Semantic Cache

FAISS-backed semantic caching middleware for LLM applications: normalize noisy user queries, retrieve semantically similar cached answers locally, and only call the cloud LLM on a real miss — cutting latency and API cost.

user query ──▶ normalize ──▶ semantic lookup (FAISS) ──┬─▶ HIT  → cached answer (ms)
                                                       └─▶ MISS → your LLM → cache.add()

Features

  • Semantic lookup — cosine similarity over sentence-transformer embeddings (FAISS IndexFlatIP / IndexIDMap2), with a configurable threshold.
  • LRU eviction — hard max_records capacity; least-recently-used entries are evicted from both the vector index and the metadata store.
  • Disk persistence — pass persist_dir and the cache survives restarts.
  • Pluggable embeddings — inject your own encode_fn (custom models, prefixing rules such as e5's "query: ", GPU batching) or let the library lazily load a SentenceTransformer for you.
  • Indonesian text normalizationnormalize_text() rewrites slang/typos ("gmn cr ganti pw?") into standard text before embedding, raising hit rates.
  • Hit tracking — per-entry hits counter and last_accessed timestamps.

Installation

pip install hybrid-semantic-cache
# with the FastAPI + Gemini demo app:
pip install "hybrid-semantic-cache[demo]"

Requires Python 3.9+.

Quick Start

from hybrid_semantic_cache import HybridSemanticCache, normalize_text

cache = HybridSemanticCache(
    threshold=0.80,        # min cosine similarity for a hit
    max_records=1000,      # LRU eviction beyond this
    persist_dir="./cache", # optional: survive restarts
)

query = normalize_text("gmn cr ganti pw email yak?")

hit = cache.search(query)
if hit is not None:
    print(f"cache hit ({hit.score:.2f}):", hit.response)
else:
    answer = call_your_llm(query)          # any provider
    cache.add(query, answer)

Custom embeddings (e.g. multilingual-e5)

E5-family models need a "query: " prefix and benefit from normalization — inject your own encoder and the library never loads a second model:

import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("intfloat/multilingual-e5-base")

def encode(text: str) -> np.ndarray:
    return model.encode(f"query: {text}", normalize_embeddings=True)

cache = HybridSemanticCache(encode_fn=encode, dimension=768, threshold=0.92)

Metadata, stats, and management

entry_id = cache.add(prompt, response, metadata={"sources": [...], "agent": "rag"})
hit = cache.search(prompt)     # hit.metadata, hit.hits, hit.score
cache.stats()                  # {'records': ..., 'capacity': ..., 'total_hits': ...}
cache.remove(entry_id)
cache.clear()
len(cache)

Low-level building blocks

from hybrid_semantic_cache import VectorStore, TextEmbedder

store = VectorStore(dimension=384)
store.add_to_index(vectors, [{"question": q, "answer": a}, ...])
score, meta = store.search(query_vector)
store.save("cache.index", "metadata.json")
store = VectorStore.load("cache.index", "metadata.json")

Demo app

A self-contained FastAPI service showing the full normalize → cache → LLM fallback flow (uses Gemini when GEMINI_API_KEY is set, a stub otherwise):

pip install "hybrid-semantic-cache[demo]"
uvicorn hybrid_semantic_cache.main:app
# POST {"message": "..."} to http://127.0.0.1:8000/chat

Development

git clone https://github.com/shencell/hybrid-semantic-cache
cd hybrid-semantic-cache
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybrid_semantic_cache-0.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hybrid_semantic_cache-0.2.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file hybrid_semantic_cache-0.2.0.tar.gz.

File metadata

  • Download URL: hybrid_semantic_cache-0.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hybrid_semantic_cache-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a31ae8bbcfc084d15619e47a3c1c3a2d7afcae79608864fb69d6025e9c654ab0
MD5 c1e595506df3207a31930d2ad0d71b40
BLAKE2b-256 7fcee489bea8871cfd78c826422e2c608cae282559b4854ab52a13594b89b01b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hybrid_semantic_cache-0.2.0.tar.gz:

Publisher: publish.yml on shencell/hybrid-semantic-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hybrid_semantic_cache-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hybrid_semantic_cache-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70f38544141816ecfa029e23207f635ec7894030abd3cfbb339e184688dc38fc
MD5 8b145c15b4e7a1c91dcc376170044bd9
BLAKE2b-256 9b0e48cfa906ac51c6e4658934910f7ac1c0ad8185917f0bea7c72f84b5f9eeb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hybrid_semantic_cache-0.2.0-py3-none-any.whl:

Publisher: publish.yml on shencell/hybrid-semantic-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page