Drop-in semantic + exact caching layer for LLM applications
Project description
๐ llmcachex-ai
Drop-in semantic + exact caching layer for LLM applications (RAG, agents, chatbots)
Save up to 80% LLM cost and reduce latency by avoiding repeated model calls using intelligent caching.
โก Installation
pip install llmcachex-ai
โจ Why llmcachex-ai?
Most LLM applications repeatedly call the model for:
- Slightly rephrased queries
- Agent/tool loops
- Chat history variations
This leads to higher latency and unnecessary cost.
llmcachex-ai solves this automatically by caching responses intelligently using exact + semantic matching.
๐ฅ Features
- โก Exact cache (Redis-backed)
- ๐ง Semantic cache (FAISS + embeddings)
- ๐ Hybrid retrieval (BM25 + vector search)
- ๐งฌ Cross-encoder reranking (high-quality matches)
- ๐ค Works with agents and tools
- ๐งต Memory-aware context support
- ๐ฐ Token usage and cost tracking
- ๐งฉ Plug-and-play decorator API
๐๏ธ How It Works
User Query
โ
llm_cache decorator
โโโ Exact Cache (Redis)
โโโ Semantic Engine
โ โโโ FAISS (vector)
โ โโโ BM25 (lexical)
โ โโโ CrossEncoder (rerank)
โโโ LLM / Agent
๐ Quick Start
from llm_cachex import llm_cache, CacheConfig
@llm_cache(CacheConfig())
def ask_llm(prompt):
return llm(prompt)
print(ask_llm("What is AI?")) # LLM call
print(ask_llm("Explain AI")) # Semantic cache hit
๐ค Agent Example
Works seamlessly with tools:
@llm_cache(CacheConfig())
def agent(raw_query, full_prompt):
if "calculate" in raw_query:
return str(eval(raw_query.replace("calculate", "").strip()))
if "search" in raw_query:
return f"[TOOL SEARCH RESULT] {raw_query}"
return llm(full_prompt)
๐ง Semantic Cache (Why itโs powerful)
"What is AI?"
"Explain artificial intelligence"
Both return the same cached response โ no additional LLM call required.
โ๏ธ Configuration
CacheConfig(
enable_exact=True,
enable_semantic=True,
similarity_threshold=0.7,
top_k=3,
model_name="gpt-4o-mini",
enable_metrics=True,
enable_token_cost=True
)
๐ Metrics
from llm_cachex import metrics
print(metrics.summary())
Example output:
{
"hits": 2,
"misses": 1,
"hit_rate": 66.67,
"avg_llm_latency_ms": 2000,
"avg_cache_latency_ms": 30,
"total_cost_rupees": 0.01
}
๐ฏ Use Cases
- RAG pipelines
- AI agents & tool execution
- Chatbots with memory
- Cost optimization for LLM APIs
- High-frequency query systems
โก Performance Impact
Typical improvements:
- 2โ10x latency reduction
- 50โ80% cost savings
๐ Project Structure
llm_cachex/
โโโ api/ # decorator layer
โโโ core/ # cache, metrics, memory
โโโ semantic/ # hybrid search + reranker
โโโ embedding/ # embeddings
โโโ index/ # FAISS index
โโโ similarity/ # similarity utils
โโโ utils/ # helpers
๐งญ Roadmap
- Async support
- Streaming support
- Batch inference
- Multi-model caching
- Pluggable vector DBs (Chroma / Pinecone)
- Observability dashboard
๐ค Contributing
Contributions are welcome. Open an issue to discuss ideas or submit a PR.
๐ License
MIT License
๐ค Author
Himanshu Singh
โญ Support
If this project helps you, consider giving it a star โญ on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmcachex_ai-0.1.2.tar.gz.
File metadata
- Download URL: llmcachex_ai-0.1.2.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e23340602bae3f7c5ae9c9ea9e1078a1ef8baec6e87a0957d31864001ac06de
|
|
| MD5 |
29e714bf06c5ac6b594c7f81f0c9051c
|
|
| BLAKE2b-256 |
1e10984f0af4f93d4be00b503d69a6e60bab2729505e9e0117f40e27f7cb4f7d
|
File details
Details for the file llmcachex_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: llmcachex_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0421112afdd8682ae0c81180ed86913e752d7b48d08226376198dbdb9535191
|
|
| MD5 |
30d9ec1a7da5befa16e590ac1aba792c
|
|
| BLAKE2b-256 |
679fa7d0633ee3af505c68d3b2d5bdf00621d0c2ae856e4eed2ec3e74f3839fe
|