Skip to main content

Semantic caching for LLMs

Project description

Reminiscence

License: AGPL v3 Python 3.9+ Tests

Semantic cache for LLMs and multi-agent systems

Reminiscence eliminates redundant computations by matching queries semantically instead of exact strings. Perfect for LLM applications, RAG pipelines, and agent workflows.

# These queries hit the same cache entry:
"Analyze Q3 sales data"
"Show me third quarter sales analysis"
"What were Q3 revenues?"

Why semantic caching?

Traditional caches fail for AI systems because users express the same intent differently. Reminiscence uses FastEmbed with multilingual sentence transformers to recognize equivalent queries, reducing API costs and latency.

Quick Start

pip install reminiscence
from reminiscence import Reminiscence

cache = Reminiscence()

result = cache.lookup(
    query="Analyze Q3 2024 sales",
    context={"agent": "analyst", "db": "prod"}
)

if result.is_hit:
    print(f"Cache hit! Similarity: {result.similarity:.2f}")
    data = result.result
else:
    # Execute and cache - repite query y context
    data = "expensive operation"
    cache.store(
        query="Analyze Q3 2024 sales",
        context={"agent": "analyst", "db": "prod"},
        result=data
    )

Decorator API

Automatic caching with hybrid matching (semantic + exact params):

from reminiscence import Reminiscence

cache = Reminiscence()

@cache.cached(query="prompt", context_params=["model"])
def call_llm(prompt: str, model: str):
    return expensive_llm_call(prompt, model)

# Similar prompts with same model hit cache
call_llm("Explain quantum physics", "gpt-4")
call_llm("Can you explain quantum mechanics?", "gpt-4")  # Cache hit ✓

# Different model = cache miss
call_llm("Explain quantum physics", "claude-3")  # Executes

Key Features

  • 🎯 Semantic matching - FastEmbed + cosine similarity (multilingual support)
  • 🔀 Hybrid caching - Semantic similarity + exact context matching
  • 🏗️ Production ready - LRU/LFU/FIFO eviction, TTL, health checks
  • 📊 OpenTelemetry native - Metrics, tracing, and spans out of the box
  • 🔒 Type safe - Handles DataFrames, numpy arrays, nested dicts (10MB+)
  • Zero config - Works instantly, scales to 100K+ entries with auto-indexing
  • 🔄 Background tasks - Automatic cleanup scheduler and metrics export

Configuration

from reminiscence import Reminiscence, ReminiscenceConfig

# Development (in-memory, defaults)
cache = Reminiscence()

# Production (persistent, optimized)
config = ReminiscenceConfig(
    db_uri="./cache.db",
    ttl_seconds=3600,
    eviction_policy="lru",
    max_entries=50_000,
    auto_create_index=True
)
cache = Reminiscence(config)

# With OpenTelemetry
config = ReminiscenceConfig(
    otel_enabled=True,
    otel_service_name="my-service",
    otel_endpoint="http://localhost:4317"
)
cache = Reminiscence(config)

# Docker/Kubernetes (environment variables)
cache = Reminiscence(ReminiscenceConfig.load())

Background Tasks

Automatic cleanup and metrics export:

cache = Reminiscence(ReminiscenceConfig(
    ttl_seconds=3600,
    otel_enabled=True
))

# Start background tasks
cache.start_scheduler(
    interval_seconds=1800,              # Cleanup every 30 min
    metrics_export_interval_seconds=60  # Export metrics every minute
)

# ... use cache ...

# Stop when done (or use context manager)
cache.stop_scheduler()

Context Manager

with Reminiscence() as cache:
    cache.start_scheduler()
    # ... use cache ...
    # Automatically stops scheduler on exit

Use Cases

  • LLM applications - Cache similar prompts to reduce API costs (OpenAI, Anthropic, etc.)
  • Multi-agent systems - Share cache across agents with context isolation
  • RAG pipelines - Cache retrieved documents, embeddings, and search results
  • Data analysis - Cache expensive SQL queries, pandas transformations

Observability

Built-in OpenTelemetry support for production monitoring:

# Automatic metrics collection
config = ReminiscenceConfig(
    enable_metrics=True,
    otel_enabled=True
)
cache = Reminiscence(config)

# Get current stats
stats = cache.get_stats()
print(f"Cache entries: {stats['cache_entries']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Schedulers: {stats.get('schedulers', {})}")

Available metrics:

  • Cache hits/misses and hit rate
  • Lookup and store latency
  • Total entries and evictions
  • Error counts by operation
  • Scheduler execution stats

Compatible with Prometheus, Grafana, Datadog, New Relic, and any OTLP-compatible backend.

Health Checks

Production-ready health monitoring:

health = cache.health_check()

# Returns comprehensive status
{
    "status": "healthy",  # or "unhealthy"
    "checks": {
        "embedding": {"ok": true, "error": null},
        "database": {"ok": true, "error": null},
        "error_rate": {"ok": true, "details": "..."},
        "schedulers": {"ok": true, "details": "2/2 schedulers running"},
        "opentelemetry": {"ok": true, "details": "Enabled (...)"}
    },
    "metrics": {...},
    "timestamp": 1696512000000
}

Requirements

  • Python 3.9+
  • Core: lancedb, fastembed, orjson, pyarrow, structlog
  • Optional: pandas, polars, numpy (for DataFrame/array caching)

Performance

Typical latencies on consumer hardware (M1/M2, AMD Ryzen):

  • Lookup: 5-15ms (with index), 10-50ms (without)
  • Store: 5-10ms
  • Embedding: 20-50ms (cached in-memory after first use)

Scales to 100K+ entries with automatic vector indexing (IVF-PQ).

License

AGPL v3 - See LICENSE


Built with

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reminiscence-0.5.0.tar.gz (107.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reminiscence-0.5.0-py3-none-any.whl (93.3 kB view details)

Uploaded Python 3

File details

Details for the file reminiscence-0.5.0.tar.gz.

File metadata

  • Download URL: reminiscence-0.5.0.tar.gz
  • Upload date:
  • Size: 107.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.5.0.tar.gz
Algorithm Hash digest
SHA256 48e358aa518328e9597b09d654fd007a3a7da90b7dc9e71a21e5661302906355
MD5 2bb8fda3ecb327c822f60ea145bbd4c7
BLAKE2b-256 145760ad8e8f2fa7a7cbb0086fd812b7f5f89d755fbd009621c9d330b2ce945d

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.5.0.tar.gz:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reminiscence-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: reminiscence-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 93.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89f9e3749b7a4efe83d6f2fcc5204da7668e7e892eda527c1e98e9b021ea8572
MD5 b0f7a9e76d62867fc481f37abd80f365
BLAKE2b-256 d4dce4ac7a2366242a1b48cdde8476c3ec30436cf006fce1e5a76c4fd936cfd3

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.5.0-py3-none-any.whl:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page