Skip to main content

Lightweight automatic caching for LLM API responses

Project description

ai-cache

A lightweight Python library for automatic caching of LLM API responses. Reduce costs and improve response times by caching repeated API calls to OpenAI, Anthropic, Google Gemini, and other LLM providers.

Overview

ai-cache transparently intercepts LLM API calls and caches responses locally using SQLite. When the same request is made again, the cached response is returned instantly without making an actual API call. This saves time, reduces costs, and enables offline development.

Features

  • Automatic caching - No code changes required, works with existing applications
  • Multi-provider support - Compatible with OpenAI, Anthropic, and Google Gemini APIs
  • Local storage - All data stored in SQLite database on your machine
  • Zero dependencies - Built using Python standard library only
  • Configurable expiration - Optional TTL (time-to-live) for cache entries
  • Cache management - Clear, invalidate, and monitor cache statistics
  • Privacy-focused - No data leaves your machine

Installation

pip install ai-cache

Quick Start

import ai_cache

# Enable caching globally
ai_cache.enable()

# Use any supported LLM API as normal
import openai

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is Python?"}]
)

# Subsequent identical calls return instantly from cache
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is Python?"}]
)

Configuration

Basic Usage

import ai_cache

# Enable with default settings (cache stored in ~/.ai-cache/)
ai_cache.enable()

# Enable with custom cache directory
ai_cache.enable(cache_dir="./my_cache")

# Enable with TTL (cache expires after 1 hour)
ai_cache.enable(ttl=3600)

# Combine options
ai_cache.enable(cache_dir="./cache", ttl=7200)

Cache Management

# Check if caching is enabled
is_active = ai_cache.is_enabled()

# Get cache statistics
stats = ai_cache.get_stats()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Total entries: {stats['total_entries']}")

# Clear all cached entries
ai_cache.clear()

# Invalidate cache by provider
ai_cache.invalidate(provider="openai")

# Invalidate cache by model
ai_cache.invalidate(model="gpt-4")

# Invalidate specific provider and model combination
ai_cache.invalidate(provider="openai", model="gpt-4")

# Disable caching
ai_cache.disable()

Supported Providers

OpenAI

Compatible with both legacy and modern OpenAI API versions.

import ai_cache
import openai

ai_cache.enable()

# Legacy API (openai < 1.0.0)
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}]
)

# Modern API (openai >= 1.0.0)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Anthropic (Claude)

import ai_cache
from anthropic import Anthropic

ai_cache.enable()

client = Anthropic(api_key="your-api-key")
message = client.messages.create(
    model="claude-3-sonnet-20240229",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

Google Gemini

import ai_cache
import google.generativeai as genai

ai_cache.enable()

genai.configure(api_key="your-api-key")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("What is machine learning?")

How It Works

Cache Key Generation

Each API request is fingerprinted using SHA256 hashing of:

  • Provider name (e.g., "openai", "anthropic")
  • Model identifier (e.g., "gpt-4", "claude-3")
  • Request parameters (messages, temperature, max_tokens, etc.)

Two requests are considered identical only if all components match exactly.

Cache Storage

  • Database: SQLite database stored locally
  • Default location: ~/.ai-cache/cache.db
  • Schema: Indexed table with fingerprint, provider, model, response, and timestamps
  • Thread safety: SQLite handles concurrent access automatically

Cache Expiration

When TTL is configured:

  • Entries expire after specified number of seconds
  • Expired entries are deleted automatically on access
  • No background cleanup processes

Without TTL:

  • Entries never expire automatically
  • Manual invalidation or clearing required

API Interception

The library uses monkey patching to intercept API client methods:

  • Original methods are preserved and restored on disable
  • Interception happens transparently without modifying your code
  • If a provider library is not installed, its interceptor is silently skipped

Use Cases

  • Development and Testing: Speed up development by avoiding repeated API calls during testing
  • Prompt Engineering: Iterate on prompts without incurring costs for unchanged requests
  • Batch Processing: Run evaluations or benchmarks with automatic caching
  • Offline Development: Work with cached responses when internet is unavailable
  • Cost Optimization: Reduce API costs in production for frequently repeated queries
  • Demo Applications: Build demos that work reliably without exhausting API quotas

Performance

  • Cache hits: Sub-millisecond response times (SQLite lookup)
  • Cache misses: Original API latency + minimal overhead (~1ms for fingerprinting and storage)
  • Storage: Minimal disk usage, approximately 1-10KB per cached response
  • Memory: No in-memory cache, all data persisted to disk

Limitations

  • Streaming responses: Currently not supported
  • Non-deterministic APIs: Temperature > 0 will generate different responses but use same cache key
  • Parameter sensitivity: Small changes in parameters create new cache entries
  • Binary responses: Image generation and similar APIs may not cache correctly

API Reference

ai_cache.enable(cache_dir=None, ttl=None)

Enable LLM response caching.

Parameters:

  • cache_dir (str, optional): Directory for cache database. Default: ~/.ai-cache/
  • ttl (int, optional): Time-to-live in seconds for cache entries. Default: None (no expiration)

Returns: None

ai_cache.disable()

Disable LLM response caching and restore original API methods.

Returns: None

ai_cache.clear()

Clear all cached responses from the database.

Raises: RuntimeError if cache is not enabled.

Returns: None

ai_cache.get_stats()

Get cache statistics.

Returns: Dictionary containing:

  • hits (int): Number of cache hits
  • misses (int): Number of cache misses
  • hit_rate (str): Hit rate as percentage
  • total_entries (int): Total cached entries in database

Raises: RuntimeError if cache is not enabled.

ai_cache.is_enabled()

Check if caching is currently enabled.

Returns: bool - True if enabled, False otherwise

ai_cache.invalidate(provider=None, model=None)

Invalidate cache entries by provider and/or model.

Parameters:

  • provider (str, optional): Provider name (e.g., 'openai', 'anthropic')
  • model (str, optional): Model name (e.g., 'gpt-4', 'claude-3')

Raises: RuntimeError if cache is not enabled.

Returns: None

Troubleshooting

Issue: Cache not working

  • Ensure ai_cache.enable() is called before making API calls
  • Verify the provider library is installed (e.g., pip install openai)

Issue: Different responses on cache hit

  • Cache returns exact stored response, check if request parameters match exactly
  • Temperature and random seed affect cache keys

Issue: Disk space concerns

  • Monitor cache size: ls -lh ~/.ai-cache/cache.db
  • Clear periodically: ai_cache.clear()
  • Configure TTL to auto-expire old entries

Issue: Permission errors

  • Ensure write permissions on cache directory
  • Use custom directory: ai_cache.enable(cache_dir="./cache")

Contributing

Contributions are welcome. Please submit issues and pull requests on GitHub.

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_cache-0.1.1.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_cache-0.1.1-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file ai_cache-0.1.1.tar.gz.

File metadata

  • Download URL: ai_cache-0.1.1.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for ai_cache-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c200df3847d0489707f66b763d09b858534759bec40db6e311d61bff2707e3e3
MD5 34d47496632fb7b8f8aae28f96deb1c8
BLAKE2b-256 7137e15cd51c7a3050b8c1b4da856769ddf5bcca8a4bb9e7ebad3782de2f2e16

See more details on using hashes here.

File details

Details for the file ai_cache-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ai_cache-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for ai_cache-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5e5054c0bb487e438621ef4fa5193483a0a440f1389e4dbf4eae0c0b3ca7ecc2
MD5 aef6260cd1e5b7d196d92554e4a725e5
BLAKE2b-256 69ba1ff323ecf11ab90e3e730131e726951fef2dd04e6c2ed465766afca1d965

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page