Skip to main content

Semantic caching for LLMs

Project description

Reminiscence

License: AGPL v3 Python 3.9+ Tests

Semantic cache for LLMs and multi-agent systems

Reminiscence eliminates redundant computations by matching queries semantically instead of exact strings. Perfect for LLM applications, RAG pipelines, and agent workflows.

# These queries hit the same cache entry:
"Analyze Q3 sales data"
"Show me third quarter sales analysis"
"What were Q3 revenues?"

Why semantic caching?

Traditional caches fail for AI systems because users express the same intent differently. Reminiscence uses FastEmbed with multilingual sentence transformers to recognize equivalent queries, reducing API costs and latency.

Quick Start

pip install reminiscence
from reminiscence import Reminiscence

cache = Reminiscence()

result = cache.lookup(
    query="Analyze Q3 2024 sales",
    context={"agent": "analyst", "db": "prod"}
)

if result.is_hit:
    print(f"Cache hit! Similarity: {result.similarity:.2f}")
    data = result.result
else:
    # Execute and cache - repite query y context
    data = "expensive operation"
    cache.store(
        query="Analyze Q3 2024 sales",
        context={"agent": "analyst", "db": "prod"},
        result=data
    )

Decorator API

Automatic caching with hybrid matching (semantic + exact params):

from reminiscence import Reminiscence

cache = Reminiscence()

@cache.cached(query="prompt", context_params=["model"])
def call_llm(prompt: str, model: str):
    return expensive_llm_call(prompt, model)

# Similar prompts with same model hit cache
call_llm("Explain quantum physics", "gpt-4")
call_llm("Can you explain quantum mechanics?", "gpt-4")  # Cache hit ✓

# Different model = cache miss
call_llm("Explain quantum physics", "claude-3")  # Executes

Key Features

  • 🎯 Semantic matching - FastEmbed + cosine similarity (multilingual support)
  • 🔀 Hybrid caching - Semantic similarity + exact context matching
  • 🏗️ Production ready - LRU/LFU/FIFO eviction, TTL, health checks
  • 📊 OpenTelemetry native - Metrics, tracing, and spans out of the box
  • 🔒 Type safe - Handles DataFrames, numpy arrays, nested dicts (10MB+)
  • Zero config - Works instantly, scales to 100K+ entries with auto-indexing
  • 🔄 Background tasks - Automatic cleanup scheduler and metrics export

Configuration

from reminiscence import Reminiscence, ReminiscenceConfig

# Development (in-memory, defaults)
cache = Reminiscence()

# Production (persistent, optimized)
config = ReminiscenceConfig(
    db_uri="./cache.db",
    ttl_seconds=3600,
    eviction_policy="lru",
    max_entries=50_000,
    auto_create_index=True
)
cache = Reminiscence(config)

# With OpenTelemetry
config = ReminiscenceConfig(
    otel_enabled=True,
    otel_service_name="my-service",
    otel_endpoint="http://localhost:4317"
)
cache = Reminiscence(config)

# Docker/Kubernetes (environment variables)
cache = Reminiscence(ReminiscenceConfig.load())

Background Tasks

Automatic cleanup and metrics export:

cache = Reminiscence(ReminiscenceConfig(
    ttl_seconds=3600,
    otel_enabled=True
))

# Start background tasks
cache.start_scheduler(
    interval_seconds=1800,              # Cleanup every 30 min
    metrics_export_interval_seconds=60  # Export metrics every minute
)

# ... use cache ...

# Stop when done (or use context manager)
cache.stop_scheduler()

Context Manager

with Reminiscence() as cache:
    cache.start_scheduler()
    # ... use cache ...
    # Automatically stops scheduler on exit

Use Cases

  • LLM applications - Cache similar prompts to reduce API costs (OpenAI, Anthropic, etc.)
  • Multi-agent systems - Share cache across agents with context isolation
  • RAG pipelines - Cache retrieved documents, embeddings, and search results
  • Data analysis - Cache expensive SQL queries, pandas transformations

Observability

Built-in OpenTelemetry support for production monitoring:

# Automatic metrics collection
config = ReminiscenceConfig(
    enable_metrics=True,
    otel_enabled=True
)
cache = Reminiscence(config)

# Get current stats
stats = cache.get_stats()
print(f"Cache entries: {stats['cache_entries']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Schedulers: {stats.get('schedulers', {})}")

Available metrics:

  • Cache hits/misses and hit rate
  • Lookup and store latency
  • Total entries and evictions
  • Error counts by operation
  • Scheduler execution stats

Compatible with Prometheus, Grafana, Datadog, New Relic, and any OTLP-compatible backend.

Health Checks

Production-ready health monitoring:

health = cache.health_check()

# Returns comprehensive status
{
    "status": "healthy",  # or "unhealthy"
    "checks": {
        "embedding": {"ok": true, "error": null},
        "database": {"ok": true, "error": null},
        "error_rate": {"ok": true, "details": "..."},
        "schedulers": {"ok": true, "details": "2/2 schedulers running"},
        "opentelemetry": {"ok": true, "details": "Enabled (...)"}
    },
    "metrics": {...},
    "timestamp": 1696512000000
}

Requirements

  • Python 3.9+
  • Core: lancedb, fastembed, orjson, pyarrow, structlog
  • Optional: pandas, polars, numpy (for DataFrame/array caching)

Performance

Typical latencies on consumer hardware (M1/M2, AMD Ryzen):

  • Lookup: 5-15ms (with index), 10-50ms (without)
  • Store: 5-10ms
  • Embedding: 20-50ms (cached in-memory after first use)

Scales to 100K+ entries with automatic vector indexing (IVF-PQ).

License

AGPL v3 - See LICENSE


Built with

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reminiscence-0.4.0.tar.gz (78.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reminiscence-0.4.0-py3-none-any.whl (60.9 kB view details)

Uploaded Python 3

File details

Details for the file reminiscence-0.4.0.tar.gz.

File metadata

  • Download URL: reminiscence-0.4.0.tar.gz
  • Upload date:
  • Size: 78.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.4.0.tar.gz
Algorithm Hash digest
SHA256 0957505c79889f2817bb268b92605b22ce083cd6be17a38665bac35a2fd8e3d1
MD5 7938d368398976c20c9a8fc74d2431ec
BLAKE2b-256 861faf93de2a8a76fb46d16b242788d862cc284643bf012fc8ece1016dc83bdb

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.4.0.tar.gz:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reminiscence-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: reminiscence-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 60.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab0fbb6c7dfb1a636b07caff8730f5ffeb2a5f90dc2e3b59f2fe29b1f343d984
MD5 cdf7b74c73381581e2693706d279f507
BLAKE2b-256 297109369eb75fe5c0b2583ac717bd703f819b3272170a1cdce64983020e1d83

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.4.0-py3-none-any.whl:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page