Skip to main content

Content-addressed local embedding cache. Skip duplicate embedding API calls.

Project description

embedcache

Content-addressed local embedding cache. Skip duplicate embedding API calls. Rust core, Python frontend.

The problem

You re-embed the same documents over and over. Some are identical, some differ by a trailing newline, all of them cost real money (and time) to embed at the provider. The fix is a content-addressed cache keyed on the exact bytes you would have sent: same input + same model → cached vector, otherwise compute.

embedcache is that cache, fast enough that the lookup overhead is below the network round-trip you would have paid otherwise.

Install

pip install embedcache

30-second quickstart

import numpy as np
from embedcache import EmbedCache

cache = EmbedCache("./.embedcache.redb", ttl_seconds=86400 * 30)

def embed(text: str) -> np.ndarray:
    # your real call: openai.embeddings.create(...), bedrock, cohere, etc.
    return np.zeros(384, dtype=np.float32)

vec = cache.get_or_compute("hello world", "text-embedding-3-small", embed)

For bulk ingestion, get_or_compute_many calls your batch function only on the misses:

texts = ["a", "b", "c", "d"]

def embed_batch(missing: list[str]) -> list[np.ndarray]:
    return [np.zeros(384, dtype=np.float32) for _ in missing]

vectors = cache.get_or_compute_many(texts, "text-embedding-3-small", embed_batch)

Why it is fast

  • Hashing. blake3 keys (~5x faster than SHA-256 on the 1–10 KB strings most prompts are).
  • Storage. redb is an embedded ACID KV store with a single-file format, log-structured, no Python in the hot path.
  • GIL. PyO3 releases the GIL on every get/put, so a Python thread pool calling the cache from a batch loop does not serialize on the cache.

API

class EmbedCache:
    def __init__(
        self,
        path: str | Path,
        *,
        ttl_seconds: int | None = None,
    ) -> None: ...

    def get(self, text: str, model: str) -> NDArray[np.float32] | None: ...
    def put(self, text: str, model: str, vector: NDArray[np.float32]) -> None: ...

    def get_or_compute(
        self,
        text: str,
        model: str,
        compute: Callable[[str], NDArray[np.float32]],
    ) -> NDArray[np.float32]: ...

    def get_or_compute_many(
        self,
        texts: Sequence[str],
        model: str,
        compute_batch: Callable[[list[str]], list[NDArray[np.float32]]],
    ) -> list[NDArray[np.float32]]: ...

    def purge_expired(self) -> int: ...
    def purge_to_size(self, max_bytes: int) -> int: ...
    def clear(self) -> None: ...
    def stats(self) -> dict[str, int]: ...
    def __len__(self) -> int: ...

License

Dual-licensed under MIT or Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedcache-0.1.0-cp310-abi3-macosx_11_0_arm64.whl (545.7 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file embedcache-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for embedcache-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7fe531efe58eed5d7fa783aa164bab1fb9bd4ad78a14dc33bbfd9543160fc120
MD5 84083bdfd27ba98c1e68d5c24ba065f0
BLAKE2b-256 32c8d175261d47377dc6bf379be290c2ebf95cd6636a97e1ff93c61830e0a9cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page