FAISS-backed hybrid semantic caching for LLM apps: text normalization, similarity lookup, LRU eviction, and disk persistence.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Shencell

These details have not been verified by PyPI

Project description

Hybrid Semantic Cache

FAISS-backed semantic caching middleware for LLM applications: normalize noisy user queries, retrieve semantically similar cached answers locally, and only call the cloud LLM on a real miss — cutting latency and API cost.

user query ──▶ normalize ──▶ semantic lookup (FAISS) ──┬─▶ HIT  → cached answer (ms)
                                                       └─▶ MISS → your LLM → cache.add()

Features

Semantic lookup — cosine similarity over sentence-transformer embeddings (FAISS IndexFlatIP / IndexIDMap2), with a configurable threshold.
LRU eviction — hard max_records capacity; least-recently-used entries are evicted from both the vector index and the metadata store.
Disk persistence — pass persist_dir and the cache survives restarts.
Pluggable embeddings — inject your own encode_fn (custom models, prefixing rules such as e5's "query: ", GPU batching) or let the library lazily load a SentenceTransformer for you.
Indonesian text normalization — normalize_text() rewrites slang/typos ("gmn cr ganti pw?") into standard text before embedding, raising hit rates.
Hit tracking — per-entry hits counter and last_accessed timestamps.

Installation

pip install hybrid-semantic-cache
# with the FastAPI + Gemini demo app:
pip install "hybrid-semantic-cache[demo]"

Requires Python 3.9+.

Quick Start

from hybrid_semantic_cache import HybridSemanticCache, normalize_text

cache = HybridSemanticCache(
    threshold=0.80,        # min cosine similarity for a hit
    max_records=1000,      # LRU eviction beyond this
    persist_dir="./cache", # optional: survive restarts
)

query = normalize_text("gmn cr ganti pw email yak?")

hit = cache.search(query)
if hit is not None:
    print(f"cache hit ({hit.score:.2f}):", hit.response)
else:
    answer = call_your_llm(query)          # any provider
    cache.add(query, answer)

Custom embeddings (e.g. multilingual-e5)

E5-family models need a "query: " prefix and benefit from normalization — inject your own encoder and the library never loads a second model:

import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("intfloat/multilingual-e5-base")

def encode(text: str) -> np.ndarray:
    return model.encode(f"query: {text}", normalize_embeddings=True)

cache = HybridSemanticCache(encode_fn=encode, dimension=768, threshold=0.92)

Metadata, stats, and management

entry_id = cache.add(prompt, response, metadata={"sources": [...], "agent": "rag"})
hit = cache.search(prompt)     # hit.metadata, hit.hits, hit.score
cache.stats()                  # {'records': ..., 'capacity': ..., 'total_hits': ...}
cache.remove(entry_id)
cache.clear()
len(cache)

Low-level building blocks

from hybrid_semantic_cache import VectorStore, TextEmbedder

store = VectorStore(dimension=384)
store.add_to_index(vectors, [{"question": q, "answer": a}, ...])
score, meta = store.search(query_vector)
store.save("cache.index", "metadata.json")
store = VectorStore.load("cache.index", "metadata.json")

Demo app

A self-contained FastAPI service showing the full normalize → cache → LLM fallback flow (uses Gemini when GEMINI_API_KEY is set, a stub otherwise):

pip install "hybrid-semantic-cache[demo]"
uvicorn hybrid_semantic_cache.main:app
# POST {"message": "..."} to http://127.0.0.1:8000/chat

Development

git clone https://github.com/shencell/hybrid-semantic-cache
cd hybrid-semantic-cache
pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Shencell

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 12, 2026

0.1.1

Jun 1, 2026

0.1.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hybrid_semantic_cache-0.2.0.tar.gz (15.5 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hybrid_semantic_cache-0.2.0-py3-none-any.whl (14.2 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file hybrid_semantic_cache-0.2.0.tar.gz.

File metadata

Download URL: hybrid_semantic_cache-0.2.0.tar.gz
Upload date: Jun 12, 2026
Size: 15.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hybrid_semantic_cache-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a31ae8bbcfc084d15619e47a3c1c3a2d7afcae79608864fb69d6025e9c654ab0`
MD5	`c1e595506df3207a31930d2ad0d71b40`
BLAKE2b-256	`7fcee489bea8871cfd78c826422e2c608cae282559b4854ab52a13594b89b01b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hybrid_semantic_cache-0.2.0.tar.gz:

Publisher: publish.yml on shencell/hybrid-semantic-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hybrid_semantic_cache-0.2.0.tar.gz
- Subject digest: a31ae8bbcfc084d15619e47a3c1c3a2d7afcae79608864fb69d6025e9c654ab0
- Sigstore transparency entry: 1798850431
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: shencell/hybrid-semantic-cache@89f558eab97aabee4ceccb3883852aa6a22a155a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/shencell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@89f558eab97aabee4ceccb3883852aa6a22a155a
- Trigger Event: push

File details

Details for the file hybrid_semantic_cache-0.2.0-py3-none-any.whl.

File metadata

Download URL: hybrid_semantic_cache-0.2.0-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 14.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hybrid_semantic_cache-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70f38544141816ecfa029e23207f635ec7894030abd3cfbb339e184688dc38fc`
MD5	`8b145c15b4e7a1c91dcc376170044bd9`
BLAKE2b-256	`9b0e48cfa906ac51c6e4658934910f7ac1c0ad8185917f0bea7c72f84b5f9eeb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hybrid_semantic_cache-0.2.0-py3-none-any.whl:

Publisher: publish.yml on shencell/hybrid-semantic-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hybrid_semantic_cache-0.2.0-py3-none-any.whl
- Subject digest: 70f38544141816ecfa029e23207f635ec7894030abd3cfbb339e184688dc38fc
- Sigstore transparency entry: 1798850625
- Sigstore integration time: Jun 12, 2026
Source repository:
- Permalink: shencell/hybrid-semantic-cache@89f558eab97aabee4ceccb3883852aa6a22a155a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/shencell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@89f558eab97aabee4ceccb3883852aa6a22a155a
- Trigger Event: push

hybrid-semantic-cache 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Hybrid Semantic Cache

Features

Installation

Quick Start

Custom embeddings (e.g. multilingual-e5)

Metadata, stats, and management

Low-level building blocks

Demo app

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance