Drop-in semantic + exact caching layer for LLM applications
Project description
๐ llm-cachex
Drop-in caching + retrieval layer for LLM applications (RAG, agents, chatbots).
Stop paying for repeated LLM calls. Automatically reuse responses using exact + semantic caching with zero changes to your business logic.
โจ Why llm-cachex?
Most LLM apps repeatedly call the model for:
- Slightly rephrased questions
- Agent/tool loops
- Chat history variations
๐ This wastes latency + money
llm-cachex fixes that automatically.
๐ฅ Features
- โก Exact cache (Redis-backed)
- ๐ง Semantic cache (FAISS + embeddings)
- ๐ Hybrid retrieval (BM25 + vector search)
- ๐งฌ Cross-encoder reranking (high-quality matches)
- ๐ค Agent + tool support
- ๐งต Memory-aware context support
- ๐ฐ Token + cost tracking
- ๐งฉ Plug-and-play decorator API
๐๏ธ Architecture
User Query
โ
llm_cache decorator
โโโ Exact Cache (Redis)
โโโ Semantic Engine
โ โโโ FAISS (vector)
โ โโโ BM25 (lexical)
โ โโโ CrossEncoder (rerank)
โโโ LLM / Agent
๐ฆ Installation
pip install -e .
(For now, install locally. PyPI support coming soon.)
๐ Quick Start
from llm_cachex import llm_cache, CacheConfig
@llm_cache(CacheConfig())
def ask_llm(prompt):
return llm(prompt)
print(ask_llm("What is AI?")) # LLM call
print(ask_llm("Explain AI")) # Semantic cache hit
๐ค Agent Example
Works seamlessly with tools:
@llm_cache(CacheConfig())
def agent(raw_query, full_prompt):
if "calculate" in raw_query:
return str(eval(raw_query.replace("calculate", "").strip()))
if "search" in raw_query:
return f"[TOOL SEARCH RESULT] {raw_query}"
return llm(full_prompt)
๐ง Semantic Cache (What makes this powerful)
Unlike basic caching, this system:
"What is AI?"
"Explain artificial intelligence"
๐ returns cached answer (no LLM call)
โ๏ธ Configuration
CacheConfig(
enable_exact=True,
enable_semantic=True,
similarity_threshold=0.7,
top_k=3,
model_name="gpt-4o-mini",
enable_metrics=True,
enable_token_cost=True
)
๐ Metrics
from llm_cachex import metrics
print(metrics.summary())
Example:
{
'hits': 2,
'misses': 1,
'hit_rate': 66.67,
'avg_llm_latency_ms': 2000,
'avg_cache_latency_ms': 30,
'total_cost_rupees': 0.01
}
๐งช Examples
Run demos:
python examples/basic.py
python examples/rag_demo.py
python examples/agent_demo.py
python examples/strict_test.py
๐ Project Structure
llm_cachex/
โโโ api/ # decorator layer
โโโ core/ # cache, metrics, memory
โโโ semantic/ # hybrid search + reranker
โโโ embedding/ # embeddings
โโโ index/ # FAISS index
โโโ similarity/ # similarity utils
โโโ utils/ # helpers
๐งญ Roadmap
- Async support
- Streaming support
- Batch inference
- Multi-model caching
- Pluggable vector DBs (Chroma / Pinecone)
- Observability dashboard
๐ค Contributing
PRs welcome. Open an issue for discussions.
๐ License
MIT License
๐ค Author
Himanshu Singh
โญ If this helps you
Give a star. It helps the project grow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmcachex_ai-0.1.0.tar.gz.
File metadata
- Download URL: llmcachex_ai-0.1.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7337124b11d89b1242336cda487483578ff95625a8ec3746eb8bb1f9a3c88cb
|
|
| MD5 |
d12b2311ecedb7714c37ba794c7a5e62
|
|
| BLAKE2b-256 |
840c33ebf2cbbaa41661965a05670422f9fbc58ea4a386a189248c45e48d1a0a
|
File details
Details for the file llmcachex_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmcachex_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24f6218b681f0757b80fb5802de219b29971f9297cb4b6269e6523be53467767
|
|
| MD5 |
68b150a33e9affe7e808722a26afcc8b
|
|
| BLAKE2b-256 |
43ee5a9dc5bb5fe55bf4f15e51bd6b5979154cf37cd512a6faae7871dcaaa385
|