Semantic caching for LLMs
Project description
Reminiscence
Semantic cache for LLMs and multi-agent systems
Reminiscence eliminates redundant computations by matching queries semantically instead of exact strings. Perfect for LLM applications, RAG pipelines, and agent workflows.
# These queries hit the same cache entry:
"Analyze Q3 sales data"
"Show me third quarter sales analysis"
"What were Q3 revenues?"
Why semantic caching?
Traditional caches fail for AI systems because users express the same intent differently. Reminiscence uses FastEmbed with multilingual sentence transformers to recognize equivalent queries, reducing API costs and latency.
Quick Start
pip install reminiscence
from reminiscence import Reminiscence
cache = Reminiscence()
result = cache.lookup(
query="Analyze Q3 2024 sales",
context={"agent": "analyst", "db": "prod"}
)
if result.is_hit:
print(f"Cache hit! Similarity: {result.similarity:.2f}")
data = result.result
else:
# Execute and cache - repite query y context
data = "expensive operation"
cache.store(
query="Analyze Q3 2024 sales",
context={"agent": "analyst", "db": "prod"},
result=data
)
Decorator API
Automatic caching with hybrid matching (semantic + exact params):
from reminiscence import Reminiscence
cache = Reminiscence()
@cache.cached(query="prompt", context_params=["model"])
def call_llm(prompt: str, model: str):
return expensive_llm_call(prompt, model)
# Similar prompts with same model hit cache
call_llm("Explain quantum physics", "gpt-4")
call_llm("Can you explain quantum mechanics?", "gpt-4") # Cache hit ✓
# Different model = cache miss
call_llm("Explain quantum physics", "claude-3") # Executes
Key Features
- 🎯 Semantic matching - FastEmbed + cosine similarity (multilingual support)
- 🔀 Hybrid caching - Semantic similarity + exact context matching
- 🏗️ Production ready - LRU/LFU/FIFO eviction, TTL, health checks
- 📊 OpenTelemetry native - Metrics, tracing, and spans out of the box
- 🔒 Type safe - Handles DataFrames, numpy arrays, nested dicts (10MB+)
- ⚡ Zero config - Works instantly, scales to 100K+ entries with auto-indexing
- 🔄 Background tasks - Automatic cleanup scheduler and metrics export
Configuration
from reminiscence import Reminiscence, ReminiscenceConfig
# Development (in-memory, defaults)
cache = Reminiscence()
# Production (persistent, optimized)
config = ReminiscenceConfig(
db_uri="./cache.db",
ttl_seconds=3600,
eviction_policy="lru",
max_entries=50_000,
auto_create_index=True
)
cache = Reminiscence(config)
# With OpenTelemetry
config = ReminiscenceConfig(
otel_enabled=True,
otel_service_name="my-service",
otel_endpoint="http://localhost:4317"
)
cache = Reminiscence(config)
# Docker/Kubernetes (environment variables)
cache = Reminiscence(ReminiscenceConfig.load())
Background Tasks
Automatic cleanup and metrics export:
cache = Reminiscence(ReminiscenceConfig(
ttl_seconds=3600,
otel_enabled=True
))
# Start background tasks
cache.start_scheduler(
interval_seconds=1800, # Cleanup every 30 min
metrics_export_interval_seconds=60 # Export metrics every minute
)
# ... use cache ...
# Stop when done (or use context manager)
cache.stop_scheduler()
Context Manager
with Reminiscence() as cache:
cache.start_scheduler()
# ... use cache ...
# Automatically stops scheduler on exit
Use Cases
- LLM applications - Cache similar prompts to reduce API costs (OpenAI, Anthropic, etc.)
- Multi-agent systems - Share cache across agents with context isolation
- RAG pipelines - Cache retrieved documents, embeddings, and search results
- Data analysis - Cache expensive SQL queries, pandas transformations
Observability
Built-in OpenTelemetry support for production monitoring:
# Automatic metrics collection
config = ReminiscenceConfig(
enable_metrics=True,
otel_enabled=True
)
cache = Reminiscence(config)
# Get current stats
stats = cache.get_stats()
print(f"Cache entries: {stats['cache_entries']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Schedulers: {stats.get('schedulers', {})}")
Available metrics:
- Cache hits/misses and hit rate
- Lookup and store latency
- Total entries and evictions
- Error counts by operation
- Scheduler execution stats
Compatible with Prometheus, Grafana, Datadog, New Relic, and any OTLP-compatible backend.
Health Checks
Production-ready health monitoring:
health = cache.health_check()
# Returns comprehensive status
{
"status": "healthy", # or "unhealthy"
"checks": {
"embedding": {"ok": true, "error": null},
"database": {"ok": true, "error": null},
"error_rate": {"ok": true, "details": "..."},
"schedulers": {"ok": true, "details": "2/2 schedulers running"},
"opentelemetry": {"ok": true, "details": "Enabled (...)"}
},
"metrics": {...},
"timestamp": 1696512000000
}
Requirements
- Python 3.9+
- Core:
lancedb,fastembed,orjson,pyarrow,structlog - Optional:
pandas,polars,numpy(for DataFrame/array caching)
Performance
Typical latencies on consumer hardware (M1/M2, AMD Ryzen):
- Lookup: 5-15ms (with index), 10-50ms (without)
- Store: 5-10ms
- Embedding: 20-50ms (cached in-memory after first use)
Scales to 100K+ entries with automatic vector indexing (IVF-PQ).
License
AGPL v3 - See LICENSE
Built with
- LanceDB - Vector database for embeddings
- FastEmbed - Fast embedding generation (Qdrant)
- sentence-transformers - Multilingual semantic models (paraphrase-multilingual-MiniLM-L12-v2)
- Apache Arrow - Columnar format for large payloads
- OpenTelemetry - Observability and distributed tracing
- structlog - Structured logging for production
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reminiscence-0.5.0.tar.gz.
File metadata
- Download URL: reminiscence-0.5.0.tar.gz
- Upload date:
- Size: 107.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48e358aa518328e9597b09d654fd007a3a7da90b7dc9e71a21e5661302906355
|
|
| MD5 |
2bb8fda3ecb327c822f60ea145bbd4c7
|
|
| BLAKE2b-256 |
145760ad8e8f2fa7a7cbb0086fd812b7f5f89d755fbd009621c9d330b2ce945d
|
Provenance
The following attestation bundles were made for reminiscence-0.5.0.tar.gz:
Publisher:
publish.yml on demiotic/reminiscence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reminiscence-0.5.0.tar.gz -
Subject digest:
48e358aa518328e9597b09d654fd007a3a7da90b7dc9e71a21e5661302906355 - Sigstore transparency entry: 600982406
- Sigstore integration time:
-
Permalink:
demiotic/reminiscence@8003593951ba2dd3b49ed7ade5077ce6153b0a4f -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/demiotic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8003593951ba2dd3b49ed7ade5077ce6153b0a4f -
Trigger Event:
release
-
Statement type:
File details
Details for the file reminiscence-0.5.0-py3-none-any.whl.
File metadata
- Download URL: reminiscence-0.5.0-py3-none-any.whl
- Upload date:
- Size: 93.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89f9e3749b7a4efe83d6f2fcc5204da7668e7e892eda527c1e98e9b021ea8572
|
|
| MD5 |
b0f7a9e76d62867fc481f37abd80f365
|
|
| BLAKE2b-256 |
d4dce4ac7a2366242a1b48cdde8476c3ec30436cf006fce1e5a76c4fd936cfd3
|
Provenance
The following attestation bundles were made for reminiscence-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on demiotic/reminiscence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reminiscence-0.5.0-py3-none-any.whl -
Subject digest:
89f9e3749b7a4efe83d6f2fcc5204da7668e7e892eda527c1e98e9b021ea8572 - Sigstore transparency entry: 600982407
- Sigstore integration time:
-
Permalink:
demiotic/reminiscence@8003593951ba2dd3b49ed7ade5077ce6153b0a4f -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/demiotic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8003593951ba2dd3b49ed7ade5077ce6153b0a4f -
Trigger Event:
release
-
Statement type: