Skip to main content

Enterprise-grade caching framework for LLM responses and embeddings

Project description

CacheFuse Logo

CacheFuse

Enterprise-grade caching framework for LLM responses and embeddings

Python 3.9+ License: MIT PyPI version

Dramatically reduce LLM API costs and latency with intelligent caching


๐Ÿš€ Why CacheFuse?

CacheFuse transforms expensive LLM applications into lightning-fast, cost-effective systems through intelligent caching.

๐Ÿ’ฐ Massive Cost Savings

  • 60-90% API cost reduction in typical applications
  • 100x faster responses for cached queries (<3ms vs 2-5 seconds)
  • Smart invalidation prevents stale results

โšก Enterprise-Ready Features

  • Deterministic cache keys - Same inputs always produce same cache keys
  • Stampede protection - Concurrent requests handled intelligently
  • Multi-backend support - SQLite (local) or Redis (distributed)
  • Privacy-compliant - Hash-only mode with optional redaction hooks
  • Production monitoring - Hit rates, latency metrics, and CLI tools

๐Ÿ”ง Developer-First Design

  • Drop-in decorators - Add @llm or @embed to existing functions
  • Zero configuration - Works out of the box with sensible defaults
  • Flexible invalidation - TTL, tags, and template versioning
  • Thread-safe - Handles concurrency without race conditions

๐Ÿ“ฆ Installation

Production

pip install cachefuse

Development

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Optional Dependencies

pip install cachefuse[redis]  # For Redis backend support

โšก Quickstart

Basic LLM Caching

from cachefuse.api.cache import Cache
from cachefuse.api.decorators import llm
import openai

# Initialize cache (works out of the box)
cache = Cache.from_env()

@llm(cache=cache, ttl="7d", tag="summarize-v1", template_version="1")
def summarize(text: str, model: str = "gpt-4o-mini") -> str:
    response = openai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Summarize: {text}"}]
    )
    return response.choices[0].message.content

# First call: API request (slow + costs money)
summary1 = summarize("CacheFuse speeds up LLM applications")  # ~2-5 seconds

# Second call: Cache hit (fast + free)  
summary2 = summarize("CacheFuse speeds up LLM applications")  # ~3ms

print(f"Results identical: {summary1 == summary2}")  # True
print(f"Cache stats: {cache.stats()}")  # Hit rate, latency, savings

Embedding Caching

from cachefuse.api.decorators import embed

@embed(cache=cache, ttl="30d", tag="embeddings-v1")
def get_embeddings(texts: list[str], model: str = "text-embedding-ada-002") -> list[float]:
    response = openai.embeddings.create(
        model=model,
        input=texts
    )
    return [embedding.embedding for embedding in response.data]

# Expensive embedding calls cached automatically
vectors = get_embeddings(["Hello world", "Goodbye world"])

CLI Management

# View cache statistics
cachefuse stats

# Clear specific tags  
cachefuse purge --tag summarize-v1

# Compact SQLite database
cachefuse vacuum

# View help
cachefuse --help

Real-World Example

# RAG application with caching
@llm(cache=cache, ttl="1h", tag="rag-v1", template_version="2")
def answer_question(question: str, context: str, model: str = "gpt-4") -> str:
    return openai.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "Answer based on the context provided."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {question}"}
        ]
    ).choices[0].message.content

# Same questions with same context = instant responses + no API costs
answer = answer_question("What is CacheFuse?", "CacheFuse is a caching framework...")

๐Ÿ—๏ธ Architecture

CacheFuse is built on a clean, modular architecture designed for enterprise-scale applications:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   @llm / @embed                         โ”‚
โ”‚                   Decorators                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Cache Facade                               โ”‚
โ”‚  โ€ข Deterministic fingerprinting                        โ”‚
โ”‚  โ€ข Stampede protection (per-key locks)                 โ”‚
โ”‚  โ€ข Metrics collection                                   โ”‚
โ”‚  โ€ข Privacy mode handling                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚        Backends            โ”‚
    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
    โ”‚   SQLite   โ”‚     Redis     โ”‚
    โ”‚  (local)   โ”‚ (distributed) โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Components

  • Decorators - Simple @llm and @embed decorators for drop-in caching
  • Cache Facade - Intelligent cache management with fingerprinting and concurrency control
  • Multi-Backend - SQLite for local development, Redis for production scale
  • Metrics System - Real-time performance tracking and cost analysis

โš™๏ธ Configuration

Environment Variables

Variable Default Description
CF_BACKEND sqlite Backend type (sqlite or redis)
CF_SQLITE_PATH ~/.cache/cachefuse/cache.db SQLite database file path
CF_REDIS_URL - Redis connection string (e.g., redis://localhost:6379/0)
CF_MODE normal Privacy mode (normal or hash_only)
CF_LOCK_TIMEOUT 30 Per-key lock timeout in seconds

Configuration Methods

# Method 1: Environment-based (recommended)
from cachefuse.api.cache import Cache
cache = Cache.from_env()

# Method 2: Explicit configuration
from cachefuse.config import CacheConfig
config = CacheConfig(
    backend="redis",
    redis_url="redis://localhost:6379/0",
    mode="hash_only"
)
cache = Cache.from_config(config)

๐Ÿ—„๏ธ Storage Backends

SQLite Backend (Default)

Perfect for local development, single-machine deployments, and applications requiring file-based persistence.

Features:

  • Single-file storage with WAL mode for optimal performance
  • Built-in ACID transactions
  • Automatic schema migration
  • Vacuum support for space reclamation
  • Zero external dependencies
# Automatic (default)
cache = Cache.from_env()

# Explicit configuration
cache = Cache.from_config(CacheConfig(
    backend="sqlite",
    sqlite_path="/custom/path/cache.db"
))

Redis Backend

Ideal for distributed applications, horizontal scaling, and shared cache scenarios.

Features:

  • Distributed caching across multiple instances
  • Built-in TTL expiration
  • Atomic operations with Redis transactions
  • Tag-based bulk operations using sets
  • High availability and clustering support
cache = Cache.from_config(CacheConfig(
    backend="redis", 
    redis_url="redis://localhost:6379/0"
))

Redis Key Layout:

  • cf:entry:<key> - Cache entry data
  • cf:tag:<tag> - Set of keys with specific tag

๐ŸŽ›๏ธ Advanced Features

TTL (Time-To-Live)

Flexible expiration control with human-readable formats:

@llm(cache=cache, ttl="7d")      # 7 days
@llm(cache=cache, ttl="2h")      # 2 hours  
@llm(cache=cache, ttl="30m")     # 30 minutes
@llm(cache=cache, ttl="300s")    # 300 seconds
@llm(cache=cache, ttl=0)         # No expiration

Tags & Bulk Invalidation

Group related cache entries for easy management:

# Tag entries by version, feature, or use case
@llm(cache=cache, ttl="1h", tag="summarize-v2")
def summarize_v2(text: str) -> str: ...

@llm(cache=cache, ttl="1h", tags=["rag", "qa-v1"])  
def answer_question(question: str, context: str) -> str: ...

# Bulk invalidation
cache.purge_tag("summarize-v2")  # Clear all v2 summaries
# CLI bulk operations
cachefuse purge --tag rag          # Clear all RAG cache entries
cachefuse purge --tag qa-v1        # Clear v1 Q&A entries

Template Versioning

Automatic cache invalidation when prompts change:

# Version 1
@llm(cache=cache, ttl="1d", template_version="1")
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment: {text}"

# Version 2 - automatically uses different cache keys
@llm(cache=cache, ttl="1d", template_version="2") 
def analyze_sentiment(text: str) -> str:
    return f"Analyze sentiment with context: {text}"

Deterministic Cache Keys

Cache keys are generated from:

  • Function type (llm or embed)
  • Model parameters (model name, temperature, etc.)
  • Template version
  • Input hash (SHA256 of processed input)
  • Provider info (optional)

๐Ÿ”’ Privacy & Security

Hash-Only Mode

For privacy-sensitive applications, store only hashes instead of raw content:

from cachefuse.config import CacheConfig

# Enable privacy mode
config = CacheConfig(backend="sqlite", mode="hash_only")
cache = Cache.from_config(config)

@llm(cache=cache, ttl="1h")
def process_sensitive_data(user_input: str) -> str:
    # Raw input never stored, only hash-based cache keys
    return llm_provider_call(user_input)

Content Redaction

Automatically redact sensitive information before hashing:

def redactor(text: str) -> str:
    # Custom redaction logic
    return text.replace("SECRET_TOKEN", "[REDACTED]").replace("PASSWORD", "[REDACTED]")

cache = Cache(backend=cache._backend, config=config, redactor=redactor)

# Both calls hit the same cache (identical after redaction)
result1 = process_data("User SECRET_TOKEN abc123")  
result2 = process_data("User [REDACTED] abc123")     # Cache hit!

Security Features

  • No sensitive data storage in hash-only mode
  • Deterministic redaction ensures consistent cache hits
  • Configurable redaction functions for custom privacy needs
  • Thread-safe operations prevent race conditions

๐Ÿ“Š Performance Monitoring

Real-Time Metrics

Track cache performance and cost savings:

stats = cache.stats()
print(f"""
Cache Performance:
  Entries: {stats['entries']}
  Total Calls: {stats['total_calls']}
  Cache Hits: {stats['hits']}
  Hit Rate: {stats['hit_rate']:.2%}
  Avg Latency: {stats['avg_latency_ms']:.1f}ms
  Cost Saved: ${stats['cost_saved']:.2f}
""")

CLI Monitoring

# Detailed performance stats
cachefuse stats

# Output:
# entries: 150
# total_calls: 1000  
# hits: 850
# hit_rate: 0.85
# avg_latency_ms: 2.3
# cost_saved: 127.50

Production Monitoring

# Log metrics for monitoring systems
import logging
logger = logging.getLogger("cachefuse.metrics")

stats = cache.stats()
logger.info("cache_metrics", extra={
    "hit_rate": stats["hit_rate"],
    "avg_latency": stats["avg_latency_ms"], 
    "cost_saved": stats["cost_saved"]
})

๐Ÿ”„ Concurrency & Reliability

Stampede Protection

Prevents duplicate expensive operations when multiple requests arrive simultaneously:

# 100 concurrent requests for same uncached item
# Result: Only 1 API call, 99 cache hits
results = await asyncio.gather(*[
    summarize("same input") for _ in range(100)
])
# All results identical, massive cost/latency savings

Thread Safety

  • Per-key file locks prevent race conditions
  • ACID transactions ensure data consistency
  • Atomic operations for concurrent access
  • Lock timeout handling prevents deadlocks

Reliability Features

  • Graceful degradation when cache unavailable
  • Automatic retry logic for transient failures
  • Connection pooling for Redis backend
  • WAL mode for SQLite performance

๐Ÿงช Testing & Development

Running Tests

# Install development dependencies
uv pip install -e ".[dev]"

# Run unit tests (fast)
uv run pytest -q -m "not integration" --cov=cachefuse

# Run integration tests (requires Redis for some tests)
uv run pytest -q -m integration

# Run all tests
uv run pytest --cov=cachefuse

Performance Benchmarks

  • Cache hit latency: < 3ms (SQLite), < 1ms (Redis)
  • Stampede protection: 1 provider call regardless of concurrency
  • Memory overhead: ~50MB typical usage
  • Storage efficiency: Configurable compression and cleanup

Examples & Demos

# RAG application demo
uv run python -m cachefuse.examples.rag_demo

# Embedding caching demo  
uv run python -m cachefuse.examples.embed_demo

๐Ÿ—บ๏ธ Roadmap

v0.2.0 - Advanced Caching

  • Semantic similarity caching
  • Batch operations API
  • Enhanced metrics (p95/p99 latencies)

v0.3.0 - Enterprise Features

  • Prometheus metrics export
  • Distributed locking with Redis
  • Advanced compression algorithms

v0.4.0 - Provider Integration

  • Native OpenAI SDK integration
  • Anthropic Claude SDK support
  • Automatic cost tracking by provider

Future Releases

  • Web dashboard for cache management
  • Circuit breaker patterns
  • Multi-tier caching strategies

๐Ÿ“ˆ Performance Comparison

Scenario Without CacheFuse With CacheFuse Improvement
Repeated queries 2-5 seconds < 3ms 100-1000x faster
API costs $0.02 per call $0.00 (cached) 90%+ savings
Concurrency N ร— API calls 1 API call Perfect deduplication
Memory usage Negligible ~50MB Minimal overhead

Development Setup

# Clone the repository
git clone https://github.com/Yasserelhaddar/CacheFuse.git
cd CacheFuse

# Set up development environment
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
uv run pytest

Areas for Contribution

  • ๐Ÿ› Bug fixes and stability improvements
  • โšก Performance optimizations
  • ๐Ÿ“š Documentation and examples
  • ๐Ÿ”Œ New backend implementations
  • ๐Ÿงช Test coverage improvements

๐Ÿ“„ License

MIT License - see LICENSE file for details.


Built with โค๏ธ for the AI community

Star โญ this repo if CacheFuse helps you build better LLM applications!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachefuse-0.1.0.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachefuse-0.1.0-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file cachefuse-0.1.0.tar.gz.

File metadata

  • Download URL: cachefuse-0.1.0.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for cachefuse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2c0dfea9b764fea466cc51aefbe8a51be8af8118f0e8c36a329f72f33b1b88cf
MD5 1efdb51fb63fdd14fe56709b1dd9e62b
BLAKE2b-256 bf24d7be9bff8b82266a4b8379831986cea21480f7106d2d19960e7b2c1afee8

See more details on using hashes here.

File details

Details for the file cachefuse-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cachefuse-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.12

File hashes

Hashes for cachefuse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14269a4a05851a08d3bbb0ccf337f56ebdbd94c53c1b43a7d2b2de4dc4525cae
MD5 e672930213205efde1f9aabff50de7f9
BLAKE2b-256 7303cc65ec277d05ee61bdfc582a5e724225653078a182cb40d4619b55ac05dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page