Drop-in semantic + exact caching layer for LLM applications
Project description
๐ llmcachex-ai
pip install llmcachex-ai
Drop-in caching + retrieval layer for LLM applications (RAG, agents, chatbots)
Stop paying for repeated LLM calls. Automatically reuse responses using exact + semantic caching with zero changes to your business logic.
โก Installation
pip install llmcachex-ai
โจ Why llmcachex-ai?
Most LLM applications repeatedly call the model for:
- Slightly rephrased questions
- Agent/tool loops
- Chat history variations
๐ This leads to higher latency + unnecessary cost
llmcachex-ai solves this automatically by caching intelligently.
๐ฅ Features
- โก Exact cache (Redis-backed)
- ๐ง Semantic cache (FAISS + embeddings)
- ๐ Hybrid retrieval (BM25 + vector search)
- ๐งฌ Cross-encoder reranking
- ๐ค Agent + tool compatible
- ๐งต Memory-aware context support
- ๐ฐ Token + cost tracking
- ๐งฉ Plug-and-play decorator API
๐๏ธ How It Works
User Query
โ
llm_cache decorator
โโโ Exact Cache (Redis)
โโโ Semantic Engine
โ โโโ FAISS (vector)
โ โโโ BM25 (lexical)
โ โโโ CrossEncoder (rerank)
โโโ LLM / Agent
๐ Quick Start
from llmcachex_ai import llm_cache, CacheConfig
@llm_cache(CacheConfig())
def ask_llm(prompt):
return llm(prompt)
print(ask_llm("What is AI?")) # LLM call
print(ask_llm("Explain AI")) # Semantic cache hit
๐ค Agent Example
Works seamlessly with tools:
@llm_cache(CacheConfig())
def agent(raw_query, full_prompt):
if "calculate" in raw_query:
return str(eval(raw_query.replace("calculate", "").strip()))
if "search" in raw_query:
return f"[TOOL SEARCH RESULT] {raw_query}"
return llm(full_prompt)
๐ง Semantic Cache (Why itโs powerful)
Unlike basic caching:
"What is AI?"
"Explain artificial intelligence"
๐ Both return the same cached response ๐ No LLM call needed
โ๏ธ Configuration
CacheConfig(
enable_exact=True,
enable_semantic=True,
similarity_threshold=0.7,
top_k=3,
model_name="gpt-4o-mini",
enable_metrics=True,
enable_token_cost=True
)
๐ Metrics
from llmcachex_ai import metrics
print(metrics.summary())
Example output:
{
"hits": 2,
"misses": 1,
"hit_rate": 66.67,
"avg_llm_latency_ms": 2000,
"avg_cache_latency_ms": 30,
"total_cost_rupees": 0.01
}
๐ Project Structure
llm_cachex/
โโโ api/ # decorator layer
โโโ core/ # cache, metrics, memory
โโโ semantic/ # hybrid search + reranker
โโโ embedding/ # embeddings
โโโ index/ # FAISS index
โโโ similarity/ # similarity utils
โโโ utils/ # helpers
๐งญ Roadmap
- Async support
- Streaming support
- Batch inference
- Multi-model caching
- Pluggable vector DBs (Chroma / Pinecone)
- Observability dashboard
๐ค Contributing
PRs welcome. Open an issue to discuss ideas.
๐ License
MIT License
๐ค Author
Himanshu Singh
โญ If this helps you
Give it a star โญ โ it helps the project grow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmcachex_ai-0.1.1.tar.gz.
File metadata
- Download URL: llmcachex_ai-0.1.1.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f7d2429cc98298afd5036da506ea4fd77c5dcc3c8bd50e4361c55011c9655ad
|
|
| MD5 |
695f6a3b8649f6cff0799cae28de27dd
|
|
| BLAKE2b-256 |
48378bd6ba01f1dda1997b8cad82a8e2049fdc195ee40ebab38c3629ee37cbf6
|
File details
Details for the file llmcachex_ai-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llmcachex_ai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5649c1b5f12c600b2db891e1fd0775302a832debe9fc5785e7073231423657b
|
|
| MD5 |
97ee25596f9e0cb2c395fcad281657af
|
|
| BLAKE2b-256 |
8f2f91a6a5bd7830ae41ddf741246d9880d0fc48c1d561ff8721cae3dc7526f5
|