Semantic caching for LLMs

Project description

Reminiscence

Semantic cache for LLMs and multi-agent systems

Reminiscence eliminates redundant computations by matching queries semantically instead of exact strings. Perfect for LLM applications, RAG pipelines, and agent workflows.

# These queries hit the same cache entry:
"Analyze Q3 sales data"
"Show me third quarter sales analysis"
"What were Q3 revenues?"

Why semantic caching?

Traditional caches fail for AI systems because users express the same intent differently. Reminiscence uses FastEmbed with multilingual sentence transformers to recognize equivalent queries, reducing API costs and latency.

Quick Start

pip install reminiscence

from reminiscence import Reminiscence

cache = Reminiscence()

result = cache.lookup(
    query="Analyze Q3 2024 sales",
    context={"agent": "analyst", "db": "prod"}
)

if result.is_hit:
    print(f"Cache hit! Similarity: {result.similarity:.2f}")
    data = result.result
else:
    # Execute and cache - repite query y context
    data = "expensive operation"
    cache.store(
        query="Analyze Q3 2024 sales",
        context={"agent": "analyst", "db": "prod"},
        result=data
    )

Decorator API

Automatic caching with hybrid matching (semantic + exact params):

from reminiscence import Reminiscence

cache = Reminiscence()

@cache.cached(query="prompt", context_params=["model"])
def call_llm(prompt: str, model: str):
    return expensive_llm_call(prompt, model)

# Similar prompts with same model hit cache
call_llm("Explain quantum physics", "gpt-4")
call_llm("Can you explain quantum mechanics?", "gpt-4")  # Cache hit ✓

# Different model = cache miss
call_llm("Explain quantum physics", "claude-3")  # Executes

Key Features

🎯 Semantic matching - FastEmbed + cosine similarity (multilingual support)
🔀 Hybrid caching - Semantic similarity + exact context matching
🏗️ Production ready - LRU/LFU/FIFO eviction, TTL, health checks
📊 OpenTelemetry native - Metrics, tracing, and spans out of the box
🔒 Type safe - Handles DataFrames, numpy arrays, nested dicts (10MB+)
⚡ Zero config - Works instantly, scales to 100K+ entries with auto-indexing
🔄 Background tasks - Automatic cleanup scheduler and metrics export

Configuration

from reminiscence import Reminiscence, ReminiscenceConfig

# Development (in-memory, defaults)
cache = Reminiscence()

# Production (persistent, optimized)
config = ReminiscenceConfig(
    db_uri="./cache.db",
    ttl_seconds=3600,
    eviction_policy="lru",
    max_entries=50_000,
    auto_create_index=True
)
cache = Reminiscence(config)

# With OpenTelemetry
config = ReminiscenceConfig(
    otel_enabled=True,
    otel_service_name="my-service",
    otel_endpoint="http://localhost:4317"
)
cache = Reminiscence(config)

# Docker/Kubernetes (environment variables)
cache = Reminiscence(ReminiscenceConfig.load())

Background Tasks

Automatic cleanup and metrics export:

cache = Reminiscence(ReminiscenceConfig(
    ttl_seconds=3600,
    otel_enabled=True
))

# Start background tasks
cache.start_scheduler(
    interval_seconds=1800,              # Cleanup every 30 min
    metrics_export_interval_seconds=60  # Export metrics every minute
)

# ... use cache ...

# Stop when done (or use context manager)
cache.stop_scheduler()

Context Manager

with Reminiscence() as cache:
    cache.start_scheduler()
    # ... use cache ...
    # Automatically stops scheduler on exit

Use Cases

LLM applications - Cache similar prompts to reduce API costs (OpenAI, Anthropic, etc.)
Multi-agent systems - Share cache across agents with context isolation
RAG pipelines - Cache retrieved documents, embeddings, and search results
Data analysis - Cache expensive SQL queries, pandas transformations

Observability

Built-in OpenTelemetry support for production monitoring:

# Automatic metrics collection
config = ReminiscenceConfig(
    enable_metrics=True,
    otel_enabled=True
)
cache = Reminiscence(config)

# Get current stats
stats = cache.get_stats()
print(f"Cache entries: {stats['cache_entries']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Schedulers: {stats.get('schedulers', {})}")

Available metrics:

Cache hits/misses and hit rate
Lookup and store latency
Total entries and evictions
Error counts by operation
Scheduler execution stats

Compatible with Prometheus, Grafana, Datadog, New Relic, and any OTLP-compatible backend.

Health Checks

Production-ready health monitoring:

health = cache.health_check()

# Returns comprehensive status
{
    "status": "healthy",  # or "unhealthy"
    "checks": {
        "embedding": {"ok": true, "error": null},
        "database": {"ok": true, "error": null},
        "error_rate": {"ok": true, "details": "..."},
        "schedulers": {"ok": true, "details": "2/2 schedulers running"},
        "opentelemetry": {"ok": true, "details": "Enabled (...)"}
    },
    "metrics": {...},
    "timestamp": 1696512000000
}

Requirements

Python 3.9+
Core: lancedb, fastembed, orjson, pyarrow, structlog
Optional: pandas, polars, numpy (for DataFrame/array caching)

Performance

Typical latencies on consumer hardware (M1/M2, AMD Ryzen):

Lookup: 5-15ms (with index), 10-50ms (without)
Store: 5-10ms
Embedding: 20-50ms (cached in-memory after first use)

Scales to 100K+ entries with automatic vector indexing (IVF-PQ).

License

AGPL v3 - See LICENSE

Built with

LanceDB - Vector database for embeddings
FastEmbed - Fast embedding generation (Qdrant)
sentence-transformers - Multilingual semantic models (paraphrase-multilingual-MiniLM-L12-v2)
Apache Arrow - Columnar format for large payloads
OpenTelemetry - Observability and distributed tracing
structlog - Structured logging for production

Project details

Release history Release notifications | RSS feed

0.5.0

Oct 11, 2025

This version

0.4.0

Oct 9, 2025

0.3.0

Oct 8, 2025

0.2.0

Oct 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reminiscence-0.4.0.tar.gz (78.6 kB view details)

Uploaded Oct 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reminiscence-0.4.0-py3-none-any.whl (60.9 kB view details)

Uploaded Oct 9, 2025 Python 3

File details

Details for the file reminiscence-0.4.0.tar.gz.

File metadata

Download URL: reminiscence-0.4.0.tar.gz
Upload date: Oct 9, 2025
Size: 78.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`0957505c79889f2817bb268b92605b22ce083cd6be17a38665bac35a2fd8e3d1`
MD5	`7938d368398976c20c9a8fc74d2431ec`
BLAKE2b-256	`861faf93de2a8a76fb46d16b242788d862cc284643bf012fc8ece1016dc83bdb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.4.0.tar.gz:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reminiscence-0.4.0.tar.gz
- Subject digest: 0957505c79889f2817bb268b92605b22ce083cd6be17a38665bac35a2fd8e3d1
- Sigstore transparency entry: 597756661
- Sigstore integration time: Oct 9, 2025
Source repository:
- Permalink: demiotic/reminiscence@6dcf6ccd64ccf52feddc96341dd30fead167da22
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/demiotic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6dcf6ccd64ccf52feddc96341dd30fead167da22
- Trigger Event: release

File details

Details for the file reminiscence-0.4.0-py3-none-any.whl.

File metadata

Download URL: reminiscence-0.4.0-py3-none-any.whl
Upload date: Oct 9, 2025
Size: 60.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reminiscence-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab0fbb6c7dfb1a636b07caff8730f5ffeb2a5f90dc2e3b59f2fe29b1f343d984`
MD5	`cdf7b74c73381581e2693706d279f507`
BLAKE2b-256	`297109369eb75fe5c0b2583ac717bd703f819b3272170a1cdce64983020e1d83`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reminiscence-0.4.0-py3-none-any.whl:

Publisher: publish.yml on demiotic/reminiscence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reminiscence-0.4.0-py3-none-any.whl
- Subject digest: ab0fbb6c7dfb1a636b07caff8730f5ffeb2a5f90dc2e3b59f2fe29b1f343d984
- Sigstore transparency entry: 597756663
- Sigstore integration time: Oct 9, 2025
Source repository:
- Permalink: demiotic/reminiscence@6dcf6ccd64ccf52feddc96341dd30fead167da22
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/demiotic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6dcf6ccd64ccf52feddc96341dd30fead167da22
- Trigger Event: release

reminiscence 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Reminiscence

Why semantic caching?

Quick Start

Decorator API

Key Features

Configuration

Background Tasks

Context Manager

Use Cases

Observability

Health Checks

Requirements

Performance

License

Built with

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance