Skip to main content

Fast LMDB-backed vector cache with Python bindings and embedding helpers

Project description

vector_cache_lmdb

vector_cache_lmdb is a Rust library with Python bindings for storing and retrieving Vec<f32> embeddings by text key.

It uses LMDB via heed for durable mmap-backed storage with many concurrent readers and safely serialized writers across multiple processes.

Installation

pip install vector-cache-lmdb

Python usage

from vector_cache_lmdb import VectorCache

cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)

cache.put("this is a test string", [0.1, 0.5, 1.1])
print(cache.get("this is a test string"))

cache.put_multi(
    ["first string", "second string"],
    [[0.1, 0.5, 1.1], [2.0, 3.0, 4.0]],
)

print(cache.get_multi(["first string", "second string", "missing"]))
print(cache.delete("this is a test string"))
print(cache.delete_multi(["first string", "second string", "missing"]))

cache.reset()

Capacity can be configured in one of two mutually exclusive ways:

# Count-based capacity
cache = VectorCache("/var/opt/cache.bin", max_items=5_000_000)

# Byte-based capacity (vector payload bytes), fixed dimension required
cache = VectorCache("/var/opt/cache.bin", max_gb=2.0)

Passing both max_items and max_gb raises ValueError.

OpenAI Embeddings Helper

Use get_embeddings_with_cache with make_openai_embed_fn to keep ordering exact while filling cache misses:

from openai import OpenAI

from vector_cache_lmdb import (
    VectorCache,
    get_embeddings_with_cache,
    make_openai_embed_fn,
)

client = OpenAI()
cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)
embed_fn = make_openai_embed_fn("text-embedding-3-small", client=client)

texts = ["first string", "second string", "first string"]
embeddings = get_embeddings_with_cache(
    texts=texts,
    cache=cache,
    embed_fn=embed_fn,
)

make_openai_embed_fn loads .env automatically when client is omitted.

Sentence-Transformers Embeddings Helper

from sentence_transformers import SentenceTransformer

from vector_cache_lmdb import (
    VectorCache,
    get_embeddings_with_cache,
    make_sentence_transformers_embed_fn,
)

encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
cache = VectorCache("/var/opt/cache.bin", max_items=1_000_000)
embed_fn = make_sentence_transformers_embed_fn(encoder)

texts = ["first string", "second string", "first string"]
embeddings = get_embeddings_with_cache(
    texts=texts,
    cache=cache,
    embed_fn=embed_fn,
)

Generic Custom Provider

from vector_cache_lmdb import get_embeddings_with_cache

def my_embed_fn(texts: list[str]) -> list[list[float]]:
    return my_provider.embed(texts)  # must return list[list[float]]

embeddings = get_embeddings_with_cache(
    texts=texts,
    cache=cache,
    embed_fn=my_embed_fn,
)

Notes

  • The path argument is an LMDB environment directory (not a single data file).
  • Hashing uses BLAKE3 for very fast fixed-size text keys.
  • get_multi preserves input order and returns None for missing entries.
  • Capacity is enforced with an on-disk LRU index shared across processes.
  • Eviction runs on writes:
    • max_items: when len() > max_items
    • max_gb: when bytes_len() > max_gb budget (vector payload bytes)
  • In max_gb mode, vector dimension is detected and must remain fixed.
  • For correctness, use a local filesystem path. LMDB locking semantics are not guaranteed on remote/network filesystems (for example SSHFS/NFS mounts).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_cache_lmdb-0.1.7.tar.gz (15.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl (389.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.38+ x86-64

vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (324.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

File details

Details for the file vector_cache_lmdb-0.1.7.tar.gz.

File metadata

  • Download URL: vector_cache_lmdb-0.1.7.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for vector_cache_lmdb-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6e8ec5a72821be192b505363ed71415f1a5552222e956041f4406b0708537196
MD5 8b6a49781545e647136fc43b7ba8a823
BLAKE2b-256 294c89841c84ce25c76a55a961deb9fd18f86acc0cc4e985aa74b9ac2f297eef

See more details on using hashes here.

File details

Details for the file vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl.

File metadata

File hashes

Hashes for vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_38_x86_64.whl
Algorithm Hash digest
SHA256 1057da839882f2c6b8706a5ec167d69d1313c35e2f1a5c98fe82dc7924fc9e39
MD5 d8b780aff28c7528a8390146c127c63c
BLAKE2b-256 172a4f193cc0b24bb0a3b3256242c6b2de83f2ecdae5982036f0366263e426a4

See more details on using hashes here.

File details

Details for the file vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vector_cache_lmdb-0.1.7-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8d7dce612e6609e72adda7ba6e17b8c74ab511b990b08760bd7e308b9f3ec22f
MD5 bc8390db399efc7198945e66ab2c54d1
BLAKE2b-256 d6d610f70c749c8c7bbca64bfbc737eccaa82e0e35610593a862bb59dc6d41e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page