Content-addressed local embedding cache. Skip duplicate embedding API calls.
Project description
embedcache
Content-addressed local embedding cache. Skip duplicate embedding API calls. Rust core, Python frontend.
The problem
You re-embed the same documents over and over. Some are identical, some differ by a trailing newline, all of them cost real money (and time) to embed at the provider. The fix is a content-addressed cache keyed on the exact bytes you would have sent: same input + same model → cached vector, otherwise compute.
embedcache is that cache, fast enough that the lookup overhead is below the
network round-trip you would have paid otherwise.
Install
pip install embedcache
30-second quickstart
import numpy as np
from embedcache import EmbedCache
cache = EmbedCache("./.embedcache.redb", ttl_seconds=86400 * 30)
def embed(text: str) -> np.ndarray:
# your real call: openai.embeddings.create(...), bedrock, cohere, etc.
return np.zeros(384, dtype=np.float32)
vec = cache.get_or_compute("hello world", "text-embedding-3-small", embed)
For bulk ingestion, get_or_compute_many calls your batch function only on
the misses:
texts = ["a", "b", "c", "d"]
def embed_batch(missing: list[str]) -> list[np.ndarray]:
return [np.zeros(384, dtype=np.float32) for _ in missing]
vectors = cache.get_or_compute_many(texts, "text-embedding-3-small", embed_batch)
Why it is fast
- Hashing. blake3 keys (~5x faster than SHA-256 on the 1–10 KB strings most prompts are).
- Storage. redb is an embedded ACID KV store with a single-file format, log-structured, no Python in the hot path.
- GIL. PyO3 releases the GIL on every
get/put, so a Python thread pool calling the cache from a batch loop does not serialize on the cache.
API
class EmbedCache:
def __init__(
self,
path: str | Path,
*,
ttl_seconds: int | None = None,
) -> None: ...
def get(self, text: str, model: str) -> NDArray[np.float32] | None: ...
def put(self, text: str, model: str, vector: NDArray[np.float32]) -> None: ...
def get_or_compute(
self,
text: str,
model: str,
compute: Callable[[str], NDArray[np.float32]],
) -> NDArray[np.float32]: ...
def get_or_compute_many(
self,
texts: Sequence[str],
model: str,
compute_batch: Callable[[list[str]], list[NDArray[np.float32]]],
) -> list[NDArray[np.float32]]: ...
def purge_expired(self) -> int: ...
def purge_to_size(self, max_bytes: int) -> int: ...
def clear(self) -> None: ...
def stats(self) -> dict[str, int]: ...
def __len__(self) -> int: ...
License
Dual-licensed under MIT or Apache-2.0 at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedcache-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: embedcache-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 545.7 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fe531efe58eed5d7fa783aa164bab1fb9bd4ad78a14dc33bbfd9543160fc120
|
|
| MD5 |
84083bdfd27ba98c1e68d5c24ba065f0
|
|
| BLAKE2b-256 |
32c8d175261d47377dc6bf379be290c2ebf95cd6636a97e1ff93c61830e0a9cf
|