Fast LMDB-backed vector cache with Python bindings and embedding helpers
Project description
vector_cache_lmdb
vector_cache_lmdb is a Rust library with Python bindings for storing and retrieving
Vec<f32> embeddings by text key.
It uses LMDB via heed for durable mmap-backed storage with many concurrent
readers and safely serialized writers across multiple processes.
Installation
pip install vector-cache-lmdb
Python usage
from vector_cache_lmdb import VectorCache
cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)
cache.put("this is a test string", [0.1, 0.5, 1.1])
print(cache.get("this is a test string"))
cache.put_multi(
["first string", "second string"],
[[0.1, 0.5, 1.1], [2.0, 3.0, 4.0]],
)
print(cache.get_multi(["first string", "second string", "missing"]))
print(cache.delete("this is a test string"))
print(cache.delete_multi(["first string", "second string", "missing"]))
cache.reset()
Capacity can be configured in one of two mutually exclusive ways:
# Count-based capacity
cache = VectorCache("/var/opt/cache.bin", max_items=5_000_000)
# Byte-based capacity (vector payload bytes), fixed dimension required
cache = VectorCache("/var/opt/cache.bin", max_gb=2.0)
Passing both max_items and max_gb raises ValueError.
OpenAI Embeddings Helper
Use get_embeddings_with_cache with make_openai_embed_fn to keep ordering exact while filling cache misses:
from openai import OpenAI
from vector_cache_lmdb import (
VectorCache,
get_embeddings_with_cache,
make_openai_embed_fn,
)
client = OpenAI()
cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)
embed_fn = make_openai_embed_fn("text-embedding-3-small", client=client)
texts = ["first string", "second string", "first string"]
embeddings = get_embeddings_with_cache(
texts=texts,
cache=cache,
embed_fn=embed_fn,
)
make_openai_embed_fn loads .env automatically when client is omitted.
Sentence-Transformers Embeddings Helper
from sentence_transformers import SentenceTransformer
from vector_cache_lmdb import (
VectorCache,
get_embeddings_with_cache,
make_sentence_transformers_embed_fn,
)
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)
embed_fn = make_sentence_transformers_embed_fn(encoder)
texts = ["first string", "second string", "first string"]
embeddings = get_embeddings_with_cache(
texts=texts,
cache=cache,
embed_fn=embed_fn,
)
Generic Custom Provider
from vector_cache_lmdb import get_embeddings_with_cache
def my_embed_fn(texts: list[str]) -> list[list[float]]:
return my_provider.embed(texts) # must return list[list[float]]
embeddings = get_embeddings_with_cache(
texts=texts,
cache=cache,
embed_fn=my_embed_fn,
)
Notes
- The
pathargument is an LMDB environment directory (not a single data file). - Hashing uses BLAKE3 for very fast fixed-size text keys.
get_multipreserves input order and returnsNonefor missing entries.- Capacity is enforced with an on-disk LRU index shared across processes.
- Eviction runs on writes:
max_items: whenlen() > max_itemsmax_gb: whenbytes_len() > max_gbbudget (vector payload bytes)
- In
max_gbmode, vector dimension is detected and must remain fixed. - For correctness, use a local filesystem path. LMDB locking semantics are not guaranteed on remote/network filesystems (for example SSHFS/NFS mounts).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vector_cache_lmdb-0.1.7.tar.gz.
File metadata
- Download URL: vector_cache_lmdb-0.1.7.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e8ec5a72821be192b505363ed71415f1a5552222e956041f4406b0708537196
|
|
| MD5 |
8b6a49781545e647136fc43b7ba8a823
|
|
| BLAKE2b-256 |
294c89841c84ce25c76a55a961deb9fd18f86acc0cc4e985aa74b9ac2f297eef
|
File details
Details for the file vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl.
File metadata
- Download URL: vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl
- Upload date:
- Size: 389.8 kB
- Tags: CPython 3.9+, manylinux: glibc 2.38+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1057da839882f2c6b8706a5ec167d69d1313c35e2f1a5c98fe82dc7924fc9e39
|
|
| MD5 |
d8b780aff28c7528a8390146c127c63c
|
|
| BLAKE2b-256 |
172a4f193cc0b24bb0a3b3256242c6b2de83f2ecdae5982036f0366263e426a4
|
File details
Details for the file vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 324.2 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d7dce612e6609e72adda7ba6e17b8c74ab511b990b08760bd7e308b9f3ec22f
|
|
| MD5 |
bc8390db399efc7198945e66ab2c54d1
|
|
| BLAKE2b-256 |
d6d610f70c749c8c7bbca64bfbc737eccaa82e0e35610593a862bb59dc6d41e5
|