Skip to main content

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Project description

betterdb-semantic-cache

Semantic cache for AI workloads backed by Valkey vector search. Embeddings-based similarity matching with OpenTelemetry and Prometheus instrumentation.

Installation

pip install betterdb-semantic-cache
# With OpenAI embeddings:
pip install betterdb-semantic-cache[openai]
# All extras:
pip install betterdb-semantic-cache[all]

Quick start

import asyncio
import valkey.asyncio as valkey
from betterdb_semantic_cache import SemanticCache, SemanticCacheOptions
from betterdb_semantic_cache.embed.openai import create_openai_embed

async def main():
    client = valkey.Valkey(host="localhost", port=6399)
    cache = SemanticCache(SemanticCacheOptions(
        client=client,
        embed_fn=create_openai_embed(),
        default_threshold=0.12,
    ))
    await cache.initialize()

    result = await cache.check("What is the capital of France?")
    if not result.hit:
        await cache.store("What is the capital of France?", "Paris")

asyncio.run(main())

LLM-as-judge

When a hit lands in the uncertainty band (threshold - uncertainty_band < score <= threshold), you can supply a judge_fn to adjudicate automatically instead of handling confidence == 'uncertain' yourself.

from betterdb_semantic_cache import JudgeOptions
from betterdb_semantic_cache.types import CacheCheckOptions

result = await cache.check(user_prompt, CacheCheckOptions(
    judge=JudgeOptions(
        judge_fn=my_judge,
        on_error="accept",   # fail-open on judge errors (default)
        timeout_ms=2000,     # per-call timeout (default)
    )
))

A minimal OpenAI judge:

from openai import AsyncOpenAI

openai = AsyncOpenAI()

async def my_judge(inp: dict) -> bool:
    # Return True to accept (confidence → 'high')
    # Return False to reject (treated as miss with nearest_miss)
    verdict = await openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Reply YES or NO only."},
            {"role": "user", "content": (
                f"Does this cached response correctly answer the prompt?\n"
                f"Prompt: {inp['prompt']}\nResponse: {inp['response']}"
            )},
        ],
    )
    return (verdict.choices[0].message.content or "").startswith("YES")

When the judge is invoked: only for confidence == 'uncertain' hits. High-confidence hits, misses, and the zero-candidates case bypass the judge entirely.

Accept path: result.hit == True, result.confidence == 'high'.

Reject path: result.hit == False, result.nearest_miss populated with delta_to_threshold <= 0 (use this to distinguish judge rejections from regular misses where delta_to_threshold > 0).

Composing with rerank: when both rerank and judge are set, the judge receives the reranked pick's response and similarity score.

check_batch() does not support judge. Call check() individually for prompts that need adjudication.

CacheCheckOptions reference

Option Type Default Description
threshold float default_threshold Per-request cosine distance threshold override
category str "" Category tag for per-category thresholds and metric labels
filter str None FT.SEARCH pre-filter expression (trusted input only)
k int 1 KNN neighbours to fetch (ignored when rerank is set)
stale_after_model_change bool False Evict and miss when stored model differs from current_model
current_model str None Model to compare against stored entries
rerank RerankOptions None Rerank hook; see RerankOptions
judge JudgeOptions None LLM-as-judge for borderline hits. Not supported by check_batch(); raises SemanticCacheUsageError

Telemetry

The published wheel includes anonymous product analytics powered by PostHog. When a baked API key is present in the package (injected at publish time), aggregate usage statistics (hit rate, cost saved) are collected on a per-instance basis — no prompt text, responses, or personally-identifiable information is ever sent.

To opt out, set the environment variable before starting your process:

export BETTERDB_TELEMETRY=false   # also accepts: 0, no, off

You can also disable it programmatically:

from betterdb_semantic_cache.types import AnalyticsOptions
cache = SemanticCache(SemanticCacheOptions(
    ...,
    analytics=AnalyticsOptions(disabled=True),
))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betterdb_semantic_cache-0.5.0.tar.gz (96.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

betterdb_semantic_cache-0.5.0-py3-none-any.whl (73.3 kB view details)

Uploaded Python 3

File details

Details for the file betterdb_semantic_cache-0.5.0.tar.gz.

File metadata

  • Download URL: betterdb_semantic_cache-0.5.0.tar.gz
  • Upload date:
  • Size: 96.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for betterdb_semantic_cache-0.5.0.tar.gz
Algorithm Hash digest
SHA256 f6fca77cb89dbf2d174e38a25f162348d57b606ba7ad46f10ac3da396094daa5
MD5 b7cd0ac26b8f947ae704f2049259b3d2
BLAKE2b-256 8bcfb8f2ba7045e3ade182351747f3eb4f12ec569d0d333edf38e9f79aa7892b

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.5.0.tar.gz:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file betterdb_semantic_cache-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for betterdb_semantic_cache-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 84cd07d418eb20cd646dbc51a89b5af71ae87f69402d4a635ceb672fd4135b21
MD5 b2c354e40a507129293ae478cc06551b
BLAKE2b-256 5214f1e0707924bdfa62649a09e65ae356fcfa578d429c3c8fcb0544fb37673d

See more details on using hashes here.

Provenance

The following attestation bundles were made for betterdb_semantic_cache-0.5.0-py3-none-any.whl:

Publisher: semantic-cache-py-release.yml on BetterDB-inc/monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page