Enterprise-grade caching framework for LLM responses and embeddings

These details have not been verified by PyPI

Project description

CacheFuse

Enterprise-grade caching framework for LLM responses and embeddings

Dramatically reduce LLM API costs and latency with intelligent caching

🚀 Why CacheFuse?

CacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.

💰 Massive Cost Savings

60-90% API cost reduction in typical applications
100x faster responses for cached queries (<3ms vs 2-5 seconds)
Smart invalidation prevents stale results

⚡ Enterprise-Ready Features

Deterministic cache keys - Same inputs always produce same cache keys
Stampede protection - Concurrent requests handled intelligently
Multi-backend support - SQLite (local) or Redis (distributed)
Privacy-compliant - Hash-only mode with optional redaction hooks
Production monitoring - Hit rates, latency metrics, and CLI tools

🔧 Developer-First Design

Drop-in decorators - Add @llm or @embed to existing functions
Zero configuration - Works out of the box with sensible defaults
Flexible invalidation - TTL, tags, and template versioning
Thread-safe - Handles concurrency without race conditions

📦 Installation

Production

pip install cachefuse

Development

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Optional Dependencies

pip install cachefuse[redis]  # For Redis backend support

⚡ Quickstart

Basic LLM Caching

from cachefuse.api.cache import Cache
from cachefuse.api.decorators import llm
import openai

# Initialize cache (works out of the box)
cache = Cache.from_env()

@llm(cache=cache, ttl="7d", tag="summarize-v1", template_version="1")
def summarize(text: str, model: str = "gpt-4o-mini") -> str:
    response = openai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Summarize: {text}"}]
    )
    return response.choices[0].message.content

# First call: API request (slow + costs money)
summary1 = summarize("CacheFuse speeds up LLM applications")  # ~2-5 seconds

# Second call: Cache hit (fast + free)  
summary2 = summarize("CacheFuse speeds up LLM applications")  # ~3ms

print(f"Results identical: {summary1 == summary2}")  # True
print(f"Cache stats: {cache.stats()}")  # Hit rate, latency, savings

Embedding Caching

from cachefuse.api.decorators import embed

@embed(cache=cache, ttl="30d", tag="embeddings-v1")
def get_embeddings(texts: list[str], model: str = "text-embedding-ada-002") -> list[float]:
    response = openai.embeddings.create(
        model=model,
        input=texts
    )
    return [embedding.embedding for embedding in response.data]

# Expensive embedding calls cached automatically
vectors = get_embeddings(["Hello world", "Goodbye world"])

CLI Management

# View cache statistics
cachefuse stats

# Clear specific tags  
cachefuse purge --tag summarize-v1

# Compact SQLite database
cachefuse vacuum

# View help
cachefuse --help

Real-World Example

# RAG application with caching
@llm(cache=cache, ttl="1h", tag="rag-v1", template_version="2")
def answer_question(question: str, context: str, model: str = "gpt-4") -> str:
    return openai.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Answer based on the context provided."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
        ]
    ).choices[0].message.content

# Same questions with same context = instant responses + no API costs
answer = answer_question("What is CacheFuse?", "CacheFuse is a caching framework...")

🏗️ Architecture

CacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:

┌─────────────────────────────────────────────────────────┐
│                   @llm / @embed                         │
│                   Decorators                            │
└─────────────────┬───────────────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────────────┐
│              Cache Facade                               │
│  • Deterministic fingerprinting                        │
│  • Stampede protection (per-key locks)                 │
│  • Metrics collection                                   │
│  • Privacy mode handling                               │
└─────────────────┬───────────────────────────────────────┘
                  │
    ┌─────────────▼──────────────┐
    │        Backends            │
    ├────────────┬───────────────┤
    │   SQLite   │     Redis     │
    │  (local)   │ (distributed) │
    └────────────┴───────────────┘

Key Components

Decorators - Simple @llm and @embed decorators for drop-in caching
Cache Facade - Intelligent cache management with fingerprinting and concurrency control
Multi-Backend - SQLite for local development, Redis for production scale
Metrics System - Real-time performance tracking and cost analysis

⚙️ Configuration

Environment Variables

Variable	Default	Description
`CF_BACKEND`	`sqlite`	Backend type (`sqlite` or `redis`)
`CF_SQLITE_PATH`	`~/.cache/cachefuse/cache.db`	SQLite database file path
`CF_REDIS_URL`	-	Redis connection string (e.g., `redis://localhost:6379/0`)
`CF_MODE`	`normal`	Privacy mode (`normal` or `hash_only`)
`CF_LOCK_TIMEOUT`	`30`	Per-key lock timeout in seconds

Configuration Methods

# Method 1: Environment-based (recommended)
from cachefuse.api.cache import Cache
cache = Cache.from_env()

# Method 2: Explicit configuration
from cachefuse.config import CacheConfig
config = CacheConfig(
    backend="redis",
    redis_url="redis://localhost:6379/0",
    mode="hash_only"
)
cache = Cache.from_config(config)

🗄️ Storage Backends

SQLite Backend (Default)

Perfect for local development, single-machine deployments, and applications requiring file-based persistence.

Features:

Single-file storage with WAL mode for optimal performance
Built-in ACID transactions
Automatic schema migration
Vacuum support for space reclamation
Zero external dependencies

# Automatic (default)
cache = Cache.from_env()

# Explicit configuration
cache = Cache.from_config(CacheConfig(
    backend="sqlite",
    sqlite_path="/custom/path/cache.db"
))

Redis Backend

Ideal for distributed applications, horizontal scaling, and shared cache scenarios.

Features:

Distributed caching across multiple instances
Built-in TTL expiration
Atomic operations with Redis transactions
Tag-based bulk operations using sets
High availability and clustering support

cache = Cache.from_config(CacheConfig(
    backend="redis", 
    redis_url="redis://localhost:6379/0"
))

Redis Key Layout:

cf:entry:<key> - Cache entry data
cf:tag:<tag> - Set of keys with specific tag

🎛️ Advanced Features

TTL (Time-To-Live)

Flexible expiration control with human-readable formats:

@llm(cache=cache, ttl="7d")      # 7 days
@llm(cache=cache, ttl="2h")      # 2 hours  
@llm(cache=cache, ttl="30m")     # 30 minutes
@llm(cache=cache, ttl="300s")    # 300 seconds
@llm(cache=cache, ttl=0)         # No expiration

Tags & Bulk Invalidation

Group related cache entries for easy management:

# Tag entries by version, feature, or use case
@llm(cache=cache, ttl="1h", tag="summarize-v2")
def summarize_v2(text: str) -> str: ...

@llm(cache=cache, ttl="1h", tags=["rag", "qa-v1"])  
def answer_question(question: str, context: str) -> str: ...

# Bulk invalidation
cache.purge_tag("summarize-v2")  # Clear all v2 summaries

# CLI bulk operations
cachefuse purge --tag rag          # Clear all RAG cache entries
cachefuse purge --tag qa-v1        # Clear v1 Q&A entries

Template Versioning

Automatic cache invalidation when prompts change:

# Version 1
@llm(cache=cache, ttl="1d", template_version="1")
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment: {text}"

# Version 2 - automatically uses different cache keys
@llm(cache=cache, ttl="1d", template_version="2") 
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment with context: {text}"

Deterministic Cache Keys

Cache keys are generated from:

Function type (llm or embed)
Model parameters (model name, temperature, etc.)
Template version
Input hash (SHA256 of processed input)
Provider info (optional)

🔒 Privacy & Security

Hash-Only Mode

For privacy-sensitive applications, store only hashes instead of raw content:

from cachefuse.config import CacheConfig

# Enable privacy mode
config = CacheConfig(backend="sqlite", mode="hash_only")
cache = Cache.from_config(config)

@llm(cache=cache, ttl="1h")
def process_sensitive_data(user_input: str) -> str:
    # Raw input never stored, only hash-based cache keys
    return llm_provider_call(user_input)

Content Redaction

Automatically redact sensitive information before hashing:

def redactor(text: str) -> str:
    # Custom redaction logic
    return text.replace("SECRET_TOKEN", "[REDACTED]").replace("PASSWORD", "[REDACTED]")

cache = Cache(backend=cache._backend, config=config, redactor=redactor)

# Both calls hit the same cache (identical after redaction)
result1 = process_data("User SECRET_TOKEN abc123")  
result2 = process_data("User [REDACTED] abc123")     # Cache hit!

Security Features

No sensitive data storage in hash-only mode
Deterministic redaction ensures consistent cache hits
Configurable redaction functions for custom privacy needs
Thread-safe operations prevent race conditions

📊 Performance Monitoring

Real-Time Metrics

Track cache performance and cost savings:

stats = cache.stats()
print(f"""
Cache Performance:
  Entries: {stats['entries']}
  Total Calls: {stats['total_calls']}
  Cache Hits: {stats['hits']}
  Hit Rate: {stats['hit_rate']:.2%}
  Avg Latency: {stats['avg_latency_ms']:.1f}ms
  Cost Saved: ${stats['cost_saved']:.2f}
""")

CLI Monitoring

# Detailed performance stats
cachefuse stats

# Output:
# entries: 150
# total_calls: 1000  
# hits: 850
# hit_rate: 0.85
# avg_latency_ms: 2.3
# cost_saved: 127.50

Production Monitoring

# Log metrics for monitoring systems
import logging
logger = logging.getLogger("cachefuse.metrics")

stats = cache.stats()
logger.info("cache_metrics", extra={
    "hit_rate": stats["hit_rate"],
    "avg_latency": stats["avg_latency_ms"], 
    "cost_saved": stats["cost_saved"]
})

🔄 Concurrency & Reliability

Stampede Protection

Prevents duplicate expensive operations when multiple requests arrive simultaneously:

# 100 concurrent requests for same uncached item
# Result: Only 1 API call, 99 cache hits
results = await asyncio.gather(*[
    summarize("same input") for _ in range(100)
])
# All results identical, massive cost/latency savings

Thread Safety

Per-key file locks prevent race conditions
ACID transactions ensure data consistency
Atomic operations for concurrent access
Lock timeout handling prevents deadlocks

Reliability Features

Graceful degradation when cache unavailable
Automatic retry logic for transient failures
Connection pooling for Redis backend
WAL mode for SQLite performance

🧪 Testing & Development

Running Tests

# Install development dependencies
uv pip install -e ".[dev]"

# Run unit tests (fast)
uv run pytest -q -m "not integration" --cov=cachefuse

# Run integration tests (requires Redis for some tests)
uv run pytest -q -m integration

# Run all tests
uv run pytest --cov=cachefuse

Performance Benchmarks

Cache hit latency: < 3ms (SQLite), < 1ms (Redis)
Stampede protection: 1 provider call regardless of concurrency
Memory overhead: ~50MB typical usage
Storage efficiency: Configurable compression and cleanup

Examples & Demos

# RAG application demo
uv run python -m cachefuse.examples.rag_demo

# Embedding caching demo  
uv run python -m cachefuse.examples.embed_demo

🗺️ Roadmap

v0.2.0 - Advanced Caching

Semantic similarity caching
Batch operations API
Enhanced metrics (p95/p99 latencies)

v0.3.0 - Enterprise Features

Prometheus metrics export
Distributed locking with Redis
Advanced compression algorithms

v0.4.0 - Provider Integration

Native OpenAI SDK integration
Anthropic Claude SDK support
Automatic cost tracking by provider

Future Releases

Web dashboard for cache management
Circuit breaker patterns
Multi-tier caching strategies

📈 Performance Comparison

Scenario	Without CacheFuse	With CacheFuse	Improvement
Repeated queries	2-5 seconds	< 3ms	100-1000x faster
API costs	$0.02 per call	$0.00 (cached)	90%+ savings
Concurrency	N × API calls	1 API call	Perfect deduplication
Memory usage	Negligible	~50MB	Minimal overhead

Development Setup

# Clone the repository
git clone https://github.com/Yasserelhaddar/CacheFuse.git
cd CacheFuse

# Set up development environment
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
uv run pytest

Areas for Contribution

🐛 Bug fixes and stability improvements
⚡ Performance optimizations
📚 Documentation and examples
🔌 New backend implementations
🧪 Test coverage improvements

📄 License

MIT License - see LICENSE file for details.

Built with ❤️ for the AI community

Star ⭐ this repo if CacheFuse helps you build better LLM applications!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Aug 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachefuse-0.1.0.tar.gz (27.0 kB view details)

Uploaded Aug 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cachefuse-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Aug 18, 2025 Python 3

File details

Details for the file cachefuse-0.1.0.tar.gz.

File metadata

Download URL: cachefuse-0.1.0.tar.gz
Upload date: Aug 18, 2025
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for cachefuse-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2c0dfea9b764fea466cc51aefbe8a51be8af8118f0e8c36a329f72f33b1b88cf`
MD5	`1efdb51fb63fdd14fe56709b1dd9e62b`
BLAKE2b-256	`bf24d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8`

See more details on using hashes here.

File details

Details for the file cachefuse-0.1.0-py3-none-any.whl.

File metadata

Download URL: cachefuse-0.1.0-py3-none-any.whl
Upload date: Aug 18, 2025
Size: 29.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for cachefuse-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14269a4a05851a08d3bbb0ccf337f56ebdbd94c53c1b43a7d2b2de4dc4525cae`
MD5	`e672930213205efde1f9aabff50de7f9`
BLAKE2b-256	`7303cc65ec277d05ee61bdfc582a5e724225653078a182cb40d4619b55ac05dd`

See more details on using hashes here.

cachefuse 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CacheFuse

🚀 Why CacheFuse?

💰 Massive Cost Savings

⚡ Enterprise-Ready Features

🔧 Developer-First Design

📦 Installation

Production

Development

Optional Dependencies

⚡ Quickstart

Basic LLM Caching

Embedding Caching

CLI Management

Real-World Example

🏗️ Architecture

Key Components

⚙️ Configuration

Environment Variables

Configuration Methods

🗄️ Storage Backends

SQLite Backend (Default)

Redis Backend

🎛️ Advanced Features

TTL (Time-To-Live)

Tags & Bulk Invalidation

Template Versioning

Deterministic Cache Keys

🔒 Privacy & Security

Hash-Only Mode

Content Redaction

Security Features

📊 Performance Monitoring

Real-Time Metrics

CLI Monitoring

Production Monitoring

🔄 Concurrency & Reliability

Stampede Protection

Thread Safety

Reliability Features

🧪 Testing & Development

Running Tests

Performance Benchmarks

Examples & Demos

🗺️ Roadmap

v0.2.0 - Advanced Caching

v0.3.0 - Enterprise Features

v0.4.0 - Provider Integration

Future Releases

📈 Performance Comparison

Development Setup

Areas for Contribution

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes