Enterprise-grade caching framework for LLM responses and embeddings
Project description
CacheFuse
Enterprise-grade caching framework for LLM responses and embeddings
Dramatically reduce LLM API costs and latency with intelligent caching
๐ Why CacheFuse?
CacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.
๐ฐ Massive Cost Savings
- 60-90% API cost reduction in typical applications
- 100x faster responses for cached queries (<3ms vs 2-5 seconds)
- Smart invalidation prevents stale results
โก Enterprise-Ready Features
- Deterministic cache keys - Same inputs always produce same cache keys
- Stampede protection - Concurrent requests handled intelligently
- Multi-backend support - SQLite (local) or Redis (distributed)
- Privacy-compliant - Hash-only mode with optional redaction hooks
- Production monitoring - Hit rates, latency metrics, and CLI tools
๐ง Developer-First Design
- Drop-in decorators - Add
@llmor@embedto existing functions - Zero configuration - Works out of the box with sensible defaults
- Flexible invalidation - TTL, tags, and template versioning
- Thread-safe - Handles concurrency without race conditions
๐ฆ Installation
Production
pip install cachefuse
Development
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
Optional Dependencies
pip install cachefuse[redis] # For Redis backend support
โก Quickstart
Basic LLM Caching
from cachefuse.api.cache import Cache
from cachefuse.api.decorators import llm
import openai
# Initialize cache (works out of the box)
cache = Cache.from_env()
@llm(cache=cache, ttl="7d", tag="summarize-v1", template_version="1")
def summarize(text: str, model: str = "gpt-4o-mini") -> str:
response = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": f"Summarize: {text}"}]
)
return response.choices[0].message.content
# First call: API request (slow + costs money)
summary1 = summarize("CacheFuse speeds up LLM applications") # ~2-5 seconds
# Second call: Cache hit (fast + free)
summary2 = summarize("CacheFuse speeds up LLM applications") # ~3ms
print(f"Results identical: {summary1 == summary2}") # True
print(f"Cache stats: {cache.stats()}") # Hit rate, latency, savings
Embedding Caching
from cachefuse.api.decorators import embed
@embed(cache=cache, ttl="30d", tag="embeddings-v1")
def get_embeddings(texts: list[str], model: str = "text-embedding-ada-002") -> list[float]:
response = openai.embeddings.create(
model=model,
input=texts
)
return [embedding.embedding for embedding in response.data]
# Expensive embedding calls cached automatically
vectors = get_embeddings(["Hello world", "Goodbye world"])
CLI Management
# View cache statistics
cachefuse stats
# Clear specific tags
cachefuse purge --tag summarize-v1
# Compact SQLite database
cachefuse vacuum
# View help
cachefuse --help
Real-World Example
# RAG application with caching
@llm(cache=cache, ttl="1h", tag="rag-v1", template_version="2")
def answer_question(question: str, context: str, model: str = "gpt-4") -> str:
return openai.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "Answer based on the context provided."},
{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
]
).choices[0].message.content
# Same questions with same context = instant responses + no API costs
answer = answer_question("What is CacheFuse?", "CacheFuse is a caching framework...")
๐๏ธ Architecture
CacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ @llm / @embed โ
โ Decorators โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cache Facade โ
โ โข Deterministic fingerprinting โ
โ โข Stampede protection (per-key locks) โ
โ โข Metrics collection โ
โ โข Privacy mode handling โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ Backends โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโค
โ SQLite โ Redis โ
โ (local) โ (distributed) โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
Key Components
- Decorators - Simple
@llmand@embeddecorators for drop-in caching - Cache Facade - Intelligent cache management with fingerprinting and concurrency control
- Multi-Backend - SQLite for local development, Redis for production scale
- Metrics System - Real-time performance tracking and cost analysis
โ๏ธ Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
CF_BACKEND |
sqlite |
Backend type (sqlite or redis) |
CF_SQLITE_PATH |
~/.cache/cachefuse/cache.db |
SQLite database file path |
CF_REDIS_URL |
- | Redis connection string (e.g., redis://localhost:6379/0) |
CF_MODE |
normal |
Privacy mode (normal or hash_only) |
CF_LOCK_TIMEOUT |
30 |
Per-key lock timeout in seconds |
Configuration Methods
# Method 1: Environment-based (recommended)
from cachefuse.api.cache import Cache
cache = Cache.from_env()
# Method 2: Explicit configuration
from cachefuse.config import CacheConfig
config = CacheConfig(
backend="redis",
redis_url="redis://localhost:6379/0",
mode="hash_only"
)
cache = Cache.from_config(config)
๐๏ธ Storage Backends
SQLite Backend (Default)
Perfect for local development, single-machine deployments, and applications requiring file-based persistence.
Features:
- Single-file storage with WAL mode for optimal performance
- Built-in ACID transactions
- Automatic schema migration
- Vacuum support for space reclamation
- Zero external dependencies
# Automatic (default)
cache = Cache.from_env()
# Explicit configuration
cache = Cache.from_config(CacheConfig(
backend="sqlite",
sqlite_path="/custom/path/cache.db"
))
Redis Backend
Ideal for distributed applications, horizontal scaling, and shared cache scenarios.
Features:
- Distributed caching across multiple instances
- Built-in TTL expiration
- Atomic operations with Redis transactions
- Tag-based bulk operations using sets
- High availability and clustering support
cache = Cache.from_config(CacheConfig(
backend="redis",
redis_url="redis://localhost:6379/0"
))
Redis Key Layout:
cf:entry:<key>- Cache entry datacf:tag:<tag>- Set of keys with specific tag
๐๏ธ Advanced Features
TTL (Time-To-Live)
Flexible expiration control with human-readable formats:
@llm(cache=cache, ttl="7d") # 7 days
@llm(cache=cache, ttl="2h") # 2 hours
@llm(cache=cache, ttl="30m") # 30 minutes
@llm(cache=cache, ttl="300s") # 300 seconds
@llm(cache=cache, ttl=0) # No expiration
Tags & Bulk Invalidation
Group related cache entries for easy management:
# Tag entries by version, feature, or use case
@llm(cache=cache, ttl="1h", tag="summarize-v2")
def summarize_v2(text: str) -> str: ...
@llm(cache=cache, ttl="1h", tags=["rag", "qa-v1"])
def answer_question(question: str, context: str) -> str: ...
# Bulk invalidation
cache.purge_tag("summarize-v2") # Clear all v2 summaries
# CLI bulk operations
cachefuse purge --tag rag # Clear all RAG cache entries
cachefuse purge --tag qa-v1 # Clear v1 Q&A entries
Template Versioning
Automatic cache invalidation when prompts change:
# Version 1
@llm(cache=cache, ttl="1d", template_version="1")
def analyze_sentiment(text: str) -> str:
return f"Analyze sentiment: {text}"
# Version 2 - automatically uses different cache keys
@llm(cache=cache, ttl="1d", template_version="2")
def analyze_sentiment(text: str) -> str:
return f"Analyze sentiment with context: {text}"
Deterministic Cache Keys
Cache keys are generated from:
- Function type (
llmorembed) - Model parameters (model name, temperature, etc.)
- Template version
- Input hash (SHA256 of processed input)
- Provider info (optional)
๐ Privacy & Security
Hash-Only Mode
For privacy-sensitive applications, store only hashes instead of raw content:
from cachefuse.config import CacheConfig
# Enable privacy mode
config = CacheConfig(backend="sqlite", mode="hash_only")
cache = Cache.from_config(config)
@llm(cache=cache, ttl="1h")
def process_sensitive_data(user_input: str) -> str:
# Raw input never stored, only hash-based cache keys
return llm_provider_call(user_input)
Content Redaction
Automatically redact sensitive information before hashing:
def redactor(text: str) -> str:
# Custom redaction logic
return text.replace("SECRET_TOKEN", "[REDACTED]").replace("PASSWORD", "[REDACTED]")
cache = Cache(backend=cache._backend, config=config, redactor=redactor)
# Both calls hit the same cache (identical after redaction)
result1 = process_data("User SECRET_TOKEN abc123")
result2 = process_data("User [REDACTED] abc123") # Cache hit!
Security Features
- No sensitive data storage in hash-only mode
- Deterministic redaction ensures consistent cache hits
- Configurable redaction functions for custom privacy needs
- Thread-safe operations prevent race conditions
๐ Performance Monitoring
Real-Time Metrics
Track cache performance and cost savings:
stats = cache.stats()
print(f"""
Cache Performance:
Entries: {stats['entries']}
Total Calls: {stats['total_calls']}
Cache Hits: {stats['hits']}
Hit Rate: {stats['hit_rate']:.2%}
Avg Latency: {stats['avg_latency_ms']:.1f}ms
Cost Saved: ${stats['cost_saved']:.2f}
""")
CLI Monitoring
# Detailed performance stats
cachefuse stats
# Output:
# entries: 150
# total_calls: 1000
# hits: 850
# hit_rate: 0.85
# avg_latency_ms: 2.3
# cost_saved: 127.50
Production Monitoring
# Log metrics for monitoring systems
import logging
logger = logging.getLogger("cachefuse.metrics")
stats = cache.stats()
logger.info("cache_metrics", extra={
"hit_rate": stats["hit_rate"],
"avg_latency": stats["avg_latency_ms"],
"cost_saved": stats["cost_saved"]
})
๐ Concurrency & Reliability
Stampede Protection
Prevents duplicate expensive operations when multiple requests arrive simultaneously:
# 100 concurrent requests for same uncached item
# Result: Only 1 API call, 99 cache hits
results = await asyncio.gather(*[
summarize("same input") for _ in range(100)
])
# All results identical, massive cost/latency savings
Thread Safety
- Per-key file locks prevent race conditions
- ACID transactions ensure data consistency
- Atomic operations for concurrent access
- Lock timeout handling prevents deadlocks
Reliability Features
- Graceful degradation when cache unavailable
- Automatic retry logic for transient failures
- Connection pooling for Redis backend
- WAL mode for SQLite performance
๐งช Testing & Development
Running Tests
# Install development dependencies
uv pip install -e ".[dev]"
# Run unit tests (fast)
uv run pytest -q -m "not integration" --cov=cachefuse
# Run integration tests (requires Redis for some tests)
uv run pytest -q -m integration
# Run all tests
uv run pytest --cov=cachefuse
Performance Benchmarks
- Cache hit latency: < 3ms (SQLite), < 1ms (Redis)
- Stampede protection: 1 provider call regardless of concurrency
- Memory overhead: ~50MB typical usage
- Storage efficiency: Configurable compression and cleanup
Examples & Demos
# RAG application demo
uv run python -m cachefuse.examples.rag_demo
# Embedding caching demo
uv run python -m cachefuse.examples.embed_demo
๐บ๏ธ Roadmap
v0.2.0 - Advanced Caching
- Semantic similarity caching
- Batch operations API
- Enhanced metrics (p95/p99 latencies)
v0.3.0 - Enterprise Features
- Prometheus metrics export
- Distributed locking with Redis
- Advanced compression algorithms
v0.4.0 - Provider Integration
- Native OpenAI SDK integration
- Anthropic Claude SDK support
- Automatic cost tracking by provider
Future Releases
- Web dashboard for cache management
- Circuit breaker patterns
- Multi-tier caching strategies
๐ Performance Comparison
| Scenario | Without CacheFuse | With CacheFuse | Improvement |
|---|---|---|---|
| Repeated queries | 2-5 seconds | < 3ms | 100-1000x faster |
| API costs | $0.02 per call | $0.00 (cached) | 90%+ savings |
| Concurrency | N ร API calls | 1 API call | Perfect deduplication |
| Memory usage | Negligible | ~50MB | Minimal overhead |
Development Setup
# Clone the repository
git clone https://github.com/Yasserelhaddar/CacheFuse.git
cd CacheFuse
# Set up development environment
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Run tests
uv run pytest
Areas for Contribution
- ๐ Bug fixes and stability improvements
- โก Performance optimizations
- ๐ Documentation and examples
- ๐ New backend implementations
- ๐งช Test coverage improvements
๐ License
MIT License - see LICENSE file for details.
Built with โค๏ธ for the AI community
Star โญ this repo if CacheFuse helps you build better LLM applications!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachefuse-0.1.0.tar.gz.
File metadata
- Download URL: cachefuse-0.1.0.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c0dfea9b764fea466cc51aefbe8a51be8af8118f0e8c36a329f72f33b1b88cf
|
|
| MD5 |
1efdb51fb63fdd14fe56709b1dd9e62b
|
|
| BLAKE2b-256 |
bf24d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8
|
File details
Details for the file cachefuse-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cachefuse-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14269a4a05851a08d3bbb0ccf337f56ebdbd94c53c1b43a7d2b2de4dc4525cae
|
|
| MD5 |
e672930213205efde1f9aabff50de7f9
|
|
| BLAKE2b-256 |
7303cc65ec277d05ee61bdfc582a5e724225653078a182cb40d4619b55ac05dd
|