Skip to main content

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Project description

betterdb-semantic-cache

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Installation

pip install betterdb-semantic-cache
# With OpenAI embeddings:
pip install betterdb-semantic-cache[openai]
# All extras:
pip install betterdb-semantic-cache[all]

Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

async def main():
    client = valkey.Valkey(host="localhost", port=6399)
    cache = SemanticCache(SemanticCacheOptions(
        client=client,
        embed_fn=create_openai_embed(),
        default_threshold=0.12,
    ))
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if not result.hit:
        await cache.store("What is the capital of France?", "Paris")

asyncio.run(main())

LLM-as-judge

When a hit lands in the uncertainty band (threshold - uncertainty_band < score <= threshold), you can supply a judge_fn to adjudicate automatically instead of handling confidence == 'uncertain' yourself.

from betterdb_semantic_cache import JudgeOptions
from betterdb_semantic_cache.types import CacheCheckOptions

result = await cache.check(user_prompt, CacheCheckOptions(
    judge=JudgeOptions(
        judge_fn=my_judge,
        on_error="accept",   # fail-open on judge errors (default)
        timeout_ms=2000,     # per-call timeout (default)
    )
))

A minimal OpenAI judge:

from openai import AsyncOpenAI

openai = AsyncOpenAI()

async def my_judge(inp: dict) -> bool:
    # Return True to accept (confidence → 'high')
    # Return False to reject (treated as miss with nearest_miss)
    verdict = await openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Reply YES or NO only."},
            {"role": "user", "content": (
                f"Does this cached response correctly answer the prompt?\n"
                f"Prompt: {inp['prompt']}\nResponse: {inp['response']}"
            )},
        ],
    )
    return (verdict.choices[0].message.content or "").startswith("YES")

When the judge is invoked: only for confidence == 'uncertain' hits. High-confidence hits, misses, and the zero-candidates case bypass the judge entirely.

Accept path: result.hit == True, result.confidence == 'high'.

Reject path: result.hit == False, result.nearest_miss populated with delta_to_threshold <= 0 (use this to distinguish judge rejections from regular misses where delta_to_threshold > 0).

Composing with rerank: when both rerank and judge are set, the judge receives the reranked pick's response and similarity score.

check_batch() does not support judge. Call check() individually for prompts that need adjudication.

CacheCheckOptions reference

Option Type Default Description
threshold float default_threshold Per-request cosine distance threshold override
category str "" Category tag for per-category thresholds and metric labels
filter str None FT.SEARCH pre-filter expression (trusted input only)
k int 1 KNN neighbours to fetch (ignored when rerank is set)
stale_after_model_change bool False Evict and miss when stored model differs from current_model
current_model str None Model to compare against stored entries
rerank RerankOptions None Rerank hook; see RerankOptions
judge JudgeOptions None LLM-as-judge for borderline hits. Not supported by check_batch(); raises SemanticCacheUsageError

Telemetry

The published wheel includes anonymous product analytics powered by PostHog. When a baked API key is present in the package (injected at publish time), aggregate usage statistics (hit rate, cost saved) are collected on a per-instance basis — no prompt text, responses, or personally-identifiable information is ever sent.

To opt out, set the environment variable before starting your process:

export BETTERDB_TELEMETRY=false   # also accepts: 0, no, off

You can also disable it programmatically:

from betterdb_semantic_cache.types import AnalyticsOptions
cache = SemanticCache(SemanticCacheOptions(
    ...,
    analytics=AnalyticsOptions(disabled=True),
))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betterdb_semantic_cache-0.4.1.tar.gz (93.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

betterdb_semantic_cache-0.4.1-py3-none-any.whl (71.6 kB view details)

Uploaded Python 3

File details

Details for the file betterdb_semantic_cache-0.4.1.tar.gz.

File metadata

  • Download URL: betterdb_semantic_cache-0.4.1.tar.gz
  • Upload date:
  • Size: 93.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for betterdb_semantic_cache-0.4.1.tar.gz
Algorithm Hash digest
SHA256 bf44aa781ae62add5eaf424d2fd4f0e4179d8e8b633b7619082f87c7dc488207
MD5 d81e7cd1fdab30bee4dd46d0bc13cafe
BLAKE2b-256 75ca26fe46afde97af2581f88097648f2e2e9da520e1ecfe4a0aaf8208cf43e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.4.1.tar.gz:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file betterdb_semantic_cache-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for betterdb_semantic_cache-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 89a503cf66bb232dc9f2b5440bf0a9ee0c674a724279bd4fb5e797b2a408c6c1
MD5 cb7af186dbbcea62ce7cf8681455b3f7
BLAKE2b-256 cc352e9f26151fd8781db9aa3513931043b8c52ce8b96325098f51d17912e303

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.4.1-py3-none-any.whl:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page