Universal semantic cache for AI APIs — text, image, voice. Drop-in wrapper for OpenAI/Anthropic SDKs.
Project description
cacheback
Universal semantic cache for AI APIs. Drop-in wrapper for OpenAI and Anthropic SDKs with multimodal support.
Cache semantically similar queries and return instant responses (<10ms). Save 30-70% on API costs.
Install
pip install cacheback-ai # core
pip install cacheback-ai[openai] # + OpenAI wrapper
pip install cacheback-ai[anthropic] # + Anthropic wrapper
pip install cacheback-ai[all] # everything
Quick Start
OpenAI (drop-in, zero code change)
from cacheback import CachedOpenAI
client = CachedOpenAI(api_key="sk-...")
# First call: ~500ms (API + cache populate)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
# Second call with similar query: ~5ms (cache hit)
response2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "capital of France?"}],
)
print(response2.cacheback_hit) # True
Anthropic
from cacheback import CachedAnthropic
client = CachedAnthropic(api_key="sk-ant-...")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "What is Python?"}],
)
print(message.cacheback_hit) # True on cache hit
Streaming
Streaming works transparently. Cache misses buffer and store the response; cache hits replay as a synthetic stream.
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Async
from cacheback import AsyncCachedOpenAI, AsyncCachedAnthropic
async_client = AsyncCachedOpenAI()
response = await async_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
Standalone Cache
Use SemanticCache directly for any embedding-based caching:
from cacheback import SemanticCache
cache = SemanticCache(
similarity_threshold=0.92,
cache_ttl=86400, # 24 hours
)
cache.populate("What is Python?", "Python is a programming language...")
result = cache.lookup("Tell me about Python") # cache hit
Negative Cache (blocklist)
Block known-bad query patterns before they hit the API:
# Block a query pattern
client.cache.negative.add(
"What is the airspeed of an unladen swallow?",
reason="hallucination",
)
# Similar queries are now blocked
client.cache.negative.check("airspeed of swallows") # returns match info
# Manage the blocklist
client.cache.negative.list(limit=50)
client.cache.negative.remove(entry_id=42)
client.cache.negative.report_false_positive(entry_id=42)
Configuration
client = CachedOpenAI(
# Cache settings
cache_dir="~/.cacheback", # where to store cache data
similarity_threshold=0.92, # cosine similarity for cache hit (0-1)
negative_threshold=0.85, # threshold for negative cache
cache_ttl=86400, # TTL in seconds (24h default)
cache_max_entries=100_000, # max entries before LRU eviction
cache_enabled=True, # set False to disable
on_negative_hit="raise", # "raise" | "skip" | callable
# OpenAI settings (passthrough)
api_key="sk-...",
)
How It Works
Query → Embed (MiniLM-L6, 384-dim) → Search HNSW index
├─ HIT (similarity ≥ 0.92) → Return cached response (<10ms)
└─ MISS → Call upstream API → Cache response → Return
- Embedder: ONNX MiniLM-L6-v2 (90MB, runs locally, no API calls)
- Index: hnswlib HNSW for fast approximate nearest neighbor search
- Store: SQLite with WAL mode for concurrent access
- Fallback: numpy brute-force if hnswlib is unavailable
CLI
cacheback stats # Show cache statistics
cacheback entries # List cached entries
cacheback evict # Remove expired entries
cacheback clear # Clear all entries
cacheback lookup "query" # Test a cache lookup
Custom Embedders
Register your own embedder for any modality:
from cacheback.embedders import BaseEmbedder, register_embedder
import numpy as np
class MyEmbedder(BaseEmbedder):
dim = 256
modality = "custom"
def encode(self, input_data) -> np.ndarray:
# Your embedding logic here
...
register_embedder("my-embedder", MyEmbedder)
cache = SemanticCache(embedder="my-embedder")
Built-in embedders: minilm (text), clip (image, coming soon), clap (voice, coming soon).
Comparison
| Feature | cacheback | GPTCache | LiteLLM | Redis LangCache |
|---|---|---|---|---|
| Semantic similarity | Yes | Yes | Exact only | Yes |
| OpenAI drop-in | Yes | Partial | Yes | No |
| Anthropic drop-in | Yes | No | Yes | No |
| Streaming support | Yes | No | No | No |
| Negative cache | Yes | No | No | No |
| Multimodal (planned) | Yes | No | No | No |
| Async | Yes | No | Yes | No |
| Zero config | Yes | No | No | No |
| Local (no server) | Yes | Yes | No | No |
| License | Apache 2.0 | MIT | MIT | Redis |
License
Apache 2.0 — see LICENSE.
Built by BGML.ai / Fundacja BLOOM.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cacheback_ai-0.1.0.tar.gz.
File metadata
- Download URL: cacheback_ai-0.1.0.tar.gz
- Upload date:
- Size: 30.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f5222a62a2d6d0e9b82e9654e388622149b137548c306625d9d1f1117d52333
|
|
| MD5 |
ad8179c375e33b96c13c3e7112d5a812
|
|
| BLAKE2b-256 |
1f2c4f630fee31d39795129cc7b51398bba27a21a5d80f15994c4aba16ade1b4
|
File details
Details for the file cacheback_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cacheback_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2532d1cf6164bc9eca005305ebe34799269c2ef6b5110271c1d39a4eed15900
|
|
| MD5 |
a5b8c68d93486e62a66024cd38bbe94b
|
|
| BLAKE2b-256 |
27ace7377ca193948ec57a0391e121941fc080d413a10474a7e0b1ae055e5ef3
|