A drop-in, model-agnostic cache for Large Language Model API calls

These details have not been verified by PyPI

Project links

Project description

LLM Cache

A drop-in, model-agnostic cache for Large Language Model API calls. Cache your OpenAI, Anthropic, and other LLM API responses to save costs and improve performance.

Author: Sherin Joseph Roy
Email: sherin.joseph2217@gmail.com
GitHub: @Sherin-SEF-AI

Features

🔐 Deterministic Hashing: SHA256-based request signature hashing
💾 Multiple Backends: SQLite (default) and Redis support
📊 Cost Tracking: Monitor API costs and savings
⚡ Streaming Support: Cache and replay streamed responses
🔧 Provider Agnostic: Works with OpenAI, Anthropic, Cohere, and more
🛡️ Encryption: Optional AES-256 encryption for sensitive data
🗜️ Compression: Zstandard compression to reduce storage
🌐 HTTP Proxy: Transparent proxy mode for existing applications
📈 Metrics: Prometheus-compatible metrics endpoint
⚙️ TTL Support: Configurable time-to-live for cache entries

Quick Start

Installation

pip install llm-cache

Basic Usage

Decorator Pattern

from llm_cache import cached_call

@cached_call(provider="openai", model="gpt-4")
def ask_llm(prompt: str):
    # Your existing OpenAI call here
    return openai_client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

# First call hits the API
response1 = ask_llm("What is Python?")
# Second call returns cached response
response2 = ask_llm("What is Python?")  # Instant!

Context Manager

from llm_cache import wrap_openai
import openai

client = openai.OpenAI()

# Wrap your client with caching
with wrap_openai(client, ttl_days=7):
    # All calls are automatically cached
    response1 = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
    
    # Same request returns cached response
    response2 = client.chat.completions.create(
        model="gpt-4", 
        messages=[{"role": "user", "content": "Hello"}]
    )

Low-level API

from llm_cache import LLMCache

cache = LLMCache()

def fetch_from_openai(prompt):
    # Your actual API call
    return openai_client.chat.completions.create(...)

# Get or set from cache
response = cache.get_or_set(
    key="unique_request_hash",
    fetch_func=lambda: fetch_from_openai("What is AI?"),
    provider="openai",
    model="gpt-4",
    endpoint="/v1/chat/completions",
    request_data={"messages": [{"role": "user", "content": "What is AI?"}]}
)

HTTP Proxy Mode

Start a proxy server that intercepts and caches LLM API calls:

llm-cache serve --host 127.0.0.1 --port 8100

Then point your applications to the proxy instead of the original API:

import openai

# Use proxy instead of direct API
client = openai.OpenAI(
    base_url="http://127.0.0.1:8100",
    api_key="your-api-key"
)

# All calls are automatically cached
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

CLI Commands

View Statistics

# Basic stats
llm-cache stats

# Detailed stats with provider breakdown
llm-cache stats --verbose

List Cache Entries

# List recent entries
llm-cache list

# Filter by provider
llm-cache list --provider openai

# Filter by model
llm-cache list --model gpt-4

# Limit results
llm-cache list --limit 10

Inspect Entries

# Show entry details
llm-cache show <cache_key>

# Export entry to file
llm-cache show <cache_key> --output entry.json

Purge Cache

# Delete specific entry
llm-cache purge --key <cache_key>

# Delete expired entries
llm-cache purge --expired

# Delete entries older than 30 days
llm-cache purge --older 30

# Delete all entries for a model
llm-cache purge --model gpt-3.5-turbo

# Delete all entries (with confirmation)
llm-cache purge --all

Export Data

# Export to JSONL format
llm-cache export cache_dump.jsonl

# Export to JSON format
llm-cache export cache_dump.json --format json

# Export only OpenAI entries
llm-cache export openai_entries.jsonl --provider openai

Health Check

# Check system health
llm-cache doctor

Configuration

Environment Variables

# Cache settings
export LLMCACHE_TTL=30                    # Default TTL in days
export LLMCACHE_COMPRESSION=true          # Enable compression
export LLMCACHE_ENCRYPTION=false          # Enable encryption
export LLMCACHE_ENCRYPTION_KEY="secret"   # Encryption key

# Storage
export LLMCACHE_BACKEND=sqlite            # Backend (sqlite, redis)
export LLMCACHE_DATABASE_URL="..."        # Database URL

# Proxy settings
export LLMCACHE_PROXY_HOST=127.0.0.1
export LLMCACHE_PROXY_PORT=8100

# Logging
export LLMCACHE_LOG_LEVEL=INFO
export LLMCACHE_LOG_FILE=/path/to/logs

Configuration File

Create ~/.config/llm-cache/config.toml:

# Cache settings
backend = "sqlite"
default_ttl_days = 30
enable_compression = true
enable_encryption = false

# Proxy settings
proxy_host = "127.0.0.1"
proxy_port = 8100

# Pricing table (cost per 1K tokens)
[pricing_table]
openai.gpt-4 = { input = 0.03, output = 0.06 }
openai.gpt-3.5-turbo = { input = 0.0015, output = 0.002 }
anthropic.claude-3 = { input = 0.015, output = 0.075 }

Advanced Usage

Streaming Support

@cached_call(provider="openai", model="gpt-4")
def streaming_call(messages, stream=True):
    return openai_client.chat.completions.create(
        model="gpt-4",
        messages=messages,
        stream=stream
    )

# First call collects the stream
response = streaming_call([{"role": "user", "content": "Hello"}], stream=True)

# Subsequent calls replay the cached stream
for chunk in response:
    print(chunk)

Custom TTL

@cached_call(provider="openai", model="gpt-4", ttl_days=7)
def short_lived_cache(prompt):
    return openai_client.chat.completions.create(...)

Encryption

import os
os.environ["LLMCACHE_ENCRYPTION_KEY"] = "your-secret-key"

cache = LLMCache(enable_encryption=True)
# All cached data will be encrypted

Redis Backend

cache = LLMCache(
    backend="redis",
    database_url="redis://localhost:6379/0"
)

Metrics

When running in proxy mode, access metrics at /metrics:

curl http://localhost:8100/metrics

Example output:

# HELP llm_cache_entries_total Total number of cache entries
# TYPE llm_cache_entries_total counter
llm_cache_entries_total 42

# HELP llm_cache_hits_total Total number of cache hits
# TYPE llm_cache_hits_total counter
llm_cache_hits_total 156

# HELP llm_cache_cost_saved_usd Total cost saved in USD
# TYPE llm_cache_cost_saved_usd counter
llm_cache_cost_saved_usd 12.34

Examples

OpenAI Integration

import openai
from llm_cache import wrap_openai

client = openai.OpenAI()

with wrap_openai(client):
    # All calls are cached
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Explain quantum computing"}],
        temperature=0.7
    )

Anthropic Integration

import anthropic
from llm_cache import cached_call

@cached_call(provider="anthropic", model="claude-3-sonnet")
def ask_claude(prompt):
    client = anthropic.Anthropic()
    return client.messages.create(
        model="claude-3-sonnet",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )

HTTP Client Integration

import httpx
from llm_cache import LLMCache

cache = LLMCache()

def cached_api_call(prompt):
    def fetch():
        with httpx.Client() as client:
            response = client.post(
                "https://api.openai.com/v1/chat/completions",
                headers={"Authorization": f"Bearer {api_key}"},
                json={
                    "model": "gpt-4",
                    "messages": [{"role": "user", "content": prompt}]
                }
            )
            return response.json()
    
    return cache.get_or_set(
        key=f"prompt_{hash(prompt)}",
        fetch_func=fetch,
        provider="openai",
        model="gpt-4",
        endpoint="/v1/chat/completions",
        request_data={"messages": [{"role": "user", "content": prompt}]}
    )

Performance

Cache Hit Rate: Typically 60-80% for repeated queries
Cost Savings: 40-60% reduction in API costs
Latency: Cache hits return in <1ms
Storage: ~1KB per cached response (compressed)

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Run pytest
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Jul 23, 2025

0.1.1

Jul 23, 2025

0.1.0

Jul 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_cache_pro-0.1.2.tar.gz (34.7 kB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_cache_pro-0.1.2-py3-none-any.whl (28.5 kB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file llm_cache_pro-0.1.2.tar.gz.

File metadata

Download URL: llm_cache_pro-0.1.2.tar.gz
Upload date: Jul 23, 2025
Size: 34.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llm_cache_pro-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0c02590af4a4f574d942a13259d5c9a6dd967d52931a0c9c7c8db32896634b22`
MD5	`88b3d6dc7b758a47e221fb02fbdee18f`
BLAKE2b-256	`cae591852fd551f483c0e9a88c966efcea39be25272b69b56016fa93ac8c4de0`

See more details on using hashes here.

File details

Details for the file llm_cache_pro-0.1.2-py3-none-any.whl.

File metadata

Download URL: llm_cache_pro-0.1.2-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for llm_cache_pro-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95a8c39836f0333813b9e7f8b7abe975e64678cc4209e4e66b9d62d3c6d56b2f`
MD5	`4d086d22f653c838437f51e8d3cd5692`
BLAKE2b-256	`1cfba9c8c3417abd8eb31c881927d270b2d0f2e2be3e381d9956346da11ebf9b`

See more details on using hashes here.

llm-cache-pro 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM Cache

Features

Quick Start

Installation

Basic Usage

Decorator Pattern

Context Manager

Low-level API

HTTP Proxy Mode

CLI Commands

View Statistics

List Cache Entries

Inspect Entries

Purge Cache

Export Data

Health Check

Configuration

Environment Variables

Configuration File

Advanced Usage

Streaming Support

Custom TTL

Encryption

Redis Backend

Metrics

Examples

OpenAI Integration

Anthropic Integration

HTTP Client Integration

Performance

Contributing

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes