Local-first, privacy-by-default embedded vector database. Your data never leaves your machine.
Project description
LodeDB
A fast, exact embedded vector database for local RAG: in-process, on-disk, no server.
Built by Egoist Machines, Inc. - efficient full-stack infrastructure for reliable AI systems.
Most embedded vector databases stop at the CPU. LodeDB runs the same on-disk index on the GPU when you have one: batched search hits 24k queries/sec on an A10 and 50k qps on an L40S, 2.8× to 4.8× the all-CPU ceiling, with recall unchanged. It also persists changed rows incrementally, so a commit stays sub-millisecond even at 1M vectors.
Fast on a laptop. Faster on a GPU. Exact every time. Never phones home.
- GPU-resident batch search: an fp16 copy of the index lives on the GPU, scored with a
tiled GEMM plus a streaming top-k (
[gpu], Linux/CUDA). How it works. - O(changed) persistence: commits only the rows that changed, 173× to 1,308× faster than a full rewrite. How it works.
- Compact storage: the MIT TurboVec core packs vectors into 2/4-bit codes and scans them with SIMD CPU kernels.
- In-process, on-disk (
.tvim/.tvd/.jsd): no daemon, no account, no API key. - Private by default: text, ids, and vectors stay local; telemetry is metrics-only (counts, bytes, latency), never raw payloads.
- Local embeddings:
sentence-transformerson CUDA, MPS, or CPU. - Batteries included: a
lodedbCLI, a loopback dev server, an MCP server, and a LangChainVectorStoreadapter.
Enterprise. The LodeDB core is Apache-2.0 and free to use. Enterprise licensing is available for commercial support, managed and at-scale serving, and on-prem / BYOC deployment. Contact sales@egoistmachines.com.
Install
pip install lodedb
pip install "lodedb[gpu]" # + GPU-resident scan (Linux/CUDA)
pip install "lodedb[mcp,langchain]" # + MCP server, LangChain adapter
Wheels bundle the patched TurboVec (Rust) core, so there's nothing to compile and no extra dependency to resolve.
Release status. Working wheels are published by the release workflow on each version tag. The first functional release is pending that initial tag; until it lands, the PyPI entry is a name-reservation placeholder, so build from source (below) in the meantime.
Building from source (contributors, or a platform without a prebuilt wheel) needs a Rust
toolchain and a CBLAS provider (Accelerate on macOS, libopenblas-dev on Linux):
git clone https://github.com/Egoist-Machines/LodeDB && cd LodeDB
uv sync # builds + bundles the TurboVec core via maturin
uv sync --extra mcp --extra langchain # + MCP server, LangChain adapter
uv sync --extra gpu # + GPU-resident scan (Linux/CUDA)
Run with uv run (e.g. uv run lodedb doctor).
Quickstart
from lodedb import LodeDB
db = LodeDB(path="./data", model="minilm") # "minilm" (fast) | "bge" (quality)
fox = db.add("the quick brown fox jumps", metadata={"topic": "animals"})
db.add("a lazy dog sleeps all day", metadata={"topic": "animals"})
for score, doc_id, meta in db.search("fox", k=5):
print(score, doc_id, meta)
for hits in db.search_many(["fox", "dog"], k=5): # batched; the GPU can serve this
print([(h.score, h.id, h.metadata) for h in hits])
db.get(fox) # -> "the quick brown fox jumps" (text retained by default)
db.persist() # durable .tvim/.tvd/.jsd snapshot; replays on reopen
Reopen with LodeDB(path="./data"); no migration step. Original text is kept in a
.tvtext sidecar for db.get; pass store_text=False to keep none. Presets are minilm
(384-dim) and bge (768-dim), with weights pulled from Hugging Face on first use. More in
examples/.
GPU-resident index
With the [gpu] extra on a CUDA host, LodeDB reconstructs the compact index into an fp16
matrix resident on the GPU and scores batched search_many with a tiled GEMM plus a
streaming top-k. It is opt-in and lazy: single queries, non-CUDA hosts, and GPU-memory
rejection fall back to the CPU scan, which stays the source of truth.
GPU throughput climbs with batch size while the CPU scan is flat. Same 4-bit index (d=1536, 100K), same host, only the scoring step differs. Crossover is around batch 50:
| query batch | A10 GPU | L40S GPU |
|---|---|---|
| 1 | 261 q/s | 432 q/s |
| 16 | 3,531 | 5,562 |
| 64 | 11,463 | 18,175 |
| 256 | 19,998 | 39,449 |
| 1024 | 24,037 | 50,326 |
Vanilla TurboVec CPU (all threads) on the same boxes: 8,497 q/s (A10 host), 10,420 q/s (L40S host). At batch 1024 the GPU is 2.8× / 4.8× that, and it scales with GPU class.
Recall is unchanged: the GPU scores the exact 4-bit reconstruction, so R@1 tracks the CPU scan across datasets and bit-widths, and edges ahead on GloVe-200 where quantization error is largest.
Other in-process vector databases stay CPU-bound. Alibaba's zvec reports about 8.4k q/s (VectorDBBench, 16-vCPU CPU, Cohere 768-dim): the same class as the TurboVec CPU scan, and a different regime from ours, so read it as the CPU-class baseline. The GPU-resident path is what clears it.
Scope. GPU search is Linux/CUDA-only and opt-in ([gpu]). macOS scans on the CPU (the
MPS scan is experimental). See docs/benchmarks.md and
docs/architecture.md.
Delta persistence
Most embedded indexes rewrite the whole file on every change (O(N)). LodeDB writes only the rows that changed (O(changed)), so a 1,000-row commit stays sub-millisecond at any size:
| corpus | full rewrite | delta export | speedup |
|---|---|---|---|
| 100K | 42.4 ms | 0.25 ms | 173× |
| 500K | 190.4 ms | 0.24 ms | 782× |
| 1M | 404.9 ms | 0.31 ms | 1,308× |
The GPU path makes reads fast; the delta makes writes cheap. The on-disk format stays a plain snapshot that replays on reopen.
Benchmarks
All artifacts are metrics-only (counts, bytes, latency), never payloads. Full methodology and the complete figure set are in docs/benchmarks.md; each benchmarks/ folder has a README and a one-line reproduction command.
Local is the common case. On an Apple M1 (MiniLM, 20K docs) the CPU scan is ~0.25 ms p50, and end-to-end single-query latency is 5.7 ms p50.
CLI
lodedb doctor # capability report: embedding / GPU / TurboVec backend
lodedb index ... # build / add to an on-disk index
lodedb query ... # search
lodedb serve # loopback dev server (127.0.0.1, no auth)
lodedb mcp # stdio MCP server for agent memory
lodedb benchmark # local, metrics-only benchmark
Limitations
- Exact scan, no ANN. Built for small-to-mid corpora where exact recall matters, not billion-scale.
- GPU is Linux/CUDA-only and opt-in (
[gpu]). macOS scans on the CPU; the MPS scan is experimental and was slower than NEON on the hardware tested. - Single queries run on the CPU; the GPU serves batched
search_many. - First PyPI release is pending the initial version tag; wheels bundle the core and publish from the release workflow (see Install).
- Model weights download from Hugging Face on first use, then cache locally.
TurboVec
The compact core is the upstream MIT TurboVec
project (© Ryan Codrai), vendored under third_party/turbovec/
with its license preserved. LodeDB's lifecycle patches (encoded-row export/import,
upsert_with_ids, calibration) are Apache-2.0. See NOTICE.
License
Apache-2.0 (LICENSE). The bundled TurboVec core is MIT (NOTICE,
third_party/turbovec/LICENSE). "LodeDB" and
"Egoist Machines" are trademarks; Apache-2.0 grants no
trademark rights (§6).
Enterprise licensing and commercial support are available from Egoist Machines, Inc.: contact sales@egoistmachines.com.
Contributing & security
PRs welcome; see CONTRIBUTING.md. Report security issues privately
per SECURITY.md, not in public issues. Other bugs and requests go to the
issue tracker.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lodedb-0.1.0.tar.gz.
File metadata
- Download URL: lodedb-0.1.0.tar.gz
- Upload date:
- Size: 272.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc8aa9e14627ff9750f09bb4b4bf638364504a7a0c04915987a63d3c97c9a88
|
|
| MD5 |
f1a2148a974effd220680a4c8b64252b
|
|
| BLAKE2b-256 |
d04667b3897277f63a113b16bcda763bcd26ed630fbf6a250786883b914d68b8
|
Provenance
The following attestation bundles were made for lodedb-0.1.0.tar.gz:
Publisher:
release.yml on Egoist-Machines/LodeDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lodedb-0.1.0.tar.gz -
Subject digest:
bdc8aa9e14627ff9750f09bb4b4bf638364504a7a0c04915987a63d3c97c9a88 - Sigstore transparency entry: 1872753450
- Sigstore integration time:
-
Permalink:
Egoist-Machines/LodeDB@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Egoist-Machines
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lodedb-0.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: lodedb-0.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 789.8 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df37818144e23e25bbad558cadedd0c2c547121632ffb016e495460447156c4
|
|
| MD5 |
8bc82f16fd2eb56bc456cf9161191b55
|
|
| BLAKE2b-256 |
c5720bb3ba1248badf70337ca78059e8b63e441769cf2bad98707fc7ee48223e
|
Provenance
The following attestation bundles were made for lodedb-0.1.0-cp39-abi3-win_amd64.whl:
Publisher:
release.yml on Egoist-Machines/LodeDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lodedb-0.1.0-cp39-abi3-win_amd64.whl -
Subject digest:
4df37818144e23e25bbad558cadedd0c2c547121632ffb016e495460447156c4 - Sigstore transparency entry: 1872753496
- Sigstore integration time:
-
Permalink:
Egoist-Machines/LodeDB@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Egoist-Machines
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lodedb-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: lodedb-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 13.2 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
606bd491ea2cd02f818f29702f9aeaf84b981e1326f3bb5bab03c38105bce28b
|
|
| MD5 |
0d4235f86736758c33672683cad9bbee
|
|
| BLAKE2b-256 |
6edfe80ca2b9feed6d08ad5cdd7c49878bb57474b7b94d1a27f2138ec0de3308
|
Provenance
The following attestation bundles were made for lodedb-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on Egoist-Machines/LodeDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lodedb-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
606bd491ea2cd02f818f29702f9aeaf84b981e1326f3bb5bab03c38105bce28b - Sigstore transparency entry: 1872753610
- Sigstore integration time:
-
Permalink:
Egoist-Machines/LodeDB@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Egoist-Machines
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lodedb-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: lodedb-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 924.3 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1edc768179f91bccc2b050f8745433402d49bd3d0fafc16e2b04bca232da00ee
|
|
| MD5 |
7c918065f7f6ffa4f8d69797f31ff4af
|
|
| BLAKE2b-256 |
23a699a8f3a303b679bd149aa626532d372a8bd4e721b42d8ebe468aa918ff5d
|
Provenance
The following attestation bundles were made for lodedb-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on Egoist-Machines/LodeDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lodedb-0.1.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
1edc768179f91bccc2b050f8745433402d49bd3d0fafc16e2b04bca232da00ee - Sigstore transparency entry: 1872754820
- Sigstore integration time:
-
Permalink:
Egoist-Machines/LodeDB@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Egoist-Machines
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lodedb-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: lodedb-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 854.9 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
693afcbba6aeb67fc018e911201ecf0c038cc6c4130993ef440be5d898392e36
|
|
| MD5 |
a69abe15715fa925999ed93cfe96371d
|
|
| BLAKE2b-256 |
cac18a6791f625a19998da404a0d7cd565b9bc28b9f38b4f23796cf051498488
|
Provenance
The following attestation bundles were made for lodedb-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on Egoist-Machines/LodeDB
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lodedb-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
693afcbba6aeb67fc018e911201ecf0c038cc6c4130993ef440be5d898392e36 - Sigstore transparency entry: 1872754888
- Sigstore integration time:
-
Permalink:
Egoist-Machines/LodeDB@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Egoist-Machines
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1dd2e0dbd7a2335f771350972bc55026b2d4d071 -
Trigger Event:
push
-
Statement type: