Skip to main content

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Project description

betterdb-semantic-cache

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Installation

pip install betterdb-semantic-cache
# With OpenAI embeddings:
pip install betterdb-semantic-cache[openai]
# All extras:
pip install betterdb-semantic-cache[all]

Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

async def main():
    client = valkey.Valkey(host="localhost", port=6399)
    cache = SemanticCache(SemanticCacheOptions(
        client=client,
        embed_fn=create_openai_embed(),
        default_threshold=0.12,
    ))
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if not result.hit:
        await cache.store("What is the capital of France?", "Paris")

asyncio.run(main())

LLM-as-judge

When a hit lands in the uncertainty band (threshold - uncertainty_band < score <= threshold), you can supply a judge_fn to adjudicate automatically instead of handling confidence == 'uncertain' yourself.

from betterdb_semantic_cache import JudgeOptions
from betterdb_semantic_cache.types import CacheCheckOptions

result = await cache.check(user_prompt, CacheCheckOptions(
    judge=JudgeOptions(
        judge_fn=my_judge,
        on_error="accept",   # fail-open on judge errors (default)
        timeout_ms=2000,     # per-call timeout (default)
    )
))

A minimal OpenAI judge:

from openai import AsyncOpenAI

openai = AsyncOpenAI()

async def my_judge(inp: dict) -> bool:
    # Return True to accept (confidence → 'high')
    # Return False to reject (treated as miss with nearest_miss)
    verdict = await openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Reply YES or NO only."},
            {"role": "user", "content": (
                f"Does this cached response correctly answer the prompt?\n"
                f"Prompt: {inp['prompt']}\nResponse: {inp['response']}"
            )},
        ],
    )
    return (verdict.choices[0].message.content or "").startswith("YES")

When the judge is invoked: only for confidence == 'uncertain' hits. High-confidence hits, misses, and the zero-candidates case bypass the judge entirely.

Accept path: result.hit == True, result.confidence == 'high'.

Reject path: result.hit == False, result.nearest_miss populated with delta_to_threshold <= 0 (use this to distinguish judge rejections from regular misses where delta_to_threshold > 0).

Composing with rerank: when both rerank and judge are set, the judge receives the reranked pick's response and similarity score.

check_batch() does not support judge. Call check() individually for prompts that need adjudication.

CacheCheckOptions reference

Option Type Default Description
threshold float default_threshold Per-request cosine distance threshold override
category str "" Category tag for per-category thresholds and metric labels
filter str None FT.SEARCH pre-filter expression (trusted input only)
k int 1 KNN neighbours to fetch (ignored when rerank is set)
stale_after_model_change bool False Evict and miss when stored model differs from current_model
current_model str None Model to compare against stored entries
rerank RerankOptions None Rerank hook; see RerankOptions
judge JudgeOptions None LLM-as-judge for borderline hits. Not supported by check_batch(); raises SemanticCacheUsageError

Telemetry

The published wheel includes anonymous product analytics powered by PostHog. When a baked API key is present in the package (injected at publish time), aggregate usage statistics (hit rate, cost saved) are collected on a per-instance basis — no prompt text, responses, or personally-identifiable information is ever sent.

To opt out, set the environment variable before starting your process:

export BETTERDB_TELEMETRY=false   # also accepts: 0, no, off

You can also disable it programmatically:

from betterdb_semantic_cache.types import AnalyticsOptions
cache = SemanticCache(SemanticCacheOptions(
    ...,
    analytics=AnalyticsOptions(disabled=True),
))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betterdb_semantic_cache-0.4.0.tar.gz (93.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

betterdb_semantic_cache-0.4.0-py3-none-any.whl (71.4 kB view details)

Uploaded Python 3

File details

Details for the file betterdb_semantic_cache-0.4.0.tar.gz.

File metadata

  • Download URL: betterdb_semantic_cache-0.4.0.tar.gz
  • Upload date:
  • Size: 93.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for betterdb_semantic_cache-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d429102024ff3838210f5db8fac1c1b90b9488515814b86b4c223c576097eee2
MD5 dfa0e9c680554f146a845c76109dd55a
BLAKE2b-256 9084465097fc62db0371d3fd485f49e434c6a5e1e17294c13f5150bfbe150dbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.4.0.tar.gz:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file betterdb_semantic_cache-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for betterdb_semantic_cache-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 691995f9d996c76d5103c99fecce123e2531e54f8656ce209af76206a5a7b6f4
MD5 f8437e9e5b4952a28f490f6e67da59a5
BLAKE2b-256 a4606c300f7e970ebcf8091e5a3c6faea03f97c0e99e7560425aafc2f9985525

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.4.0-py3-none-any.whl:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page