A fast, lightweight, and zero-setup in-memory vector store powered by NumPy
Project description
NumPy Vector Store
A fast, lightweight, zero-setup in-memory vector store powered by NumPy.
- Tiny local vector search for projects that do not need a vector database
- Fast exact vector search using vectorized NumPy operations
- Simple typed API returning
VectorHit(index, value, metadata) - Composable filtering by passing prefiltered row indexes with
within_rows - Portable persistence as trusted local
.npzfiles withvectors+metadata - No framework opinions: bring your own embeddings, chunking, async, and metadata model
Why?
This library is purpose-built for small to medium-scale vector search tasks and offers a simple alternative to heavyweight vector databases when you do not need network services, indexing infrastructure, ingestion pipelines, or domain-specific metadata filtering.
When/Where?
Below are benchmark results for cosine similarity search to help you assess its suitability for your use case.
| Embedding Type | Dimensions | ~5ms | ~25ms | ~100ms | ~500ms |
|---|---|---|---|---|---|
| Sentence Transformers | 384 | 1K vectors 1.5MB |
10K vectors 15MB |
100K vectors 147MB |
500K vectors 732MB |
| OpenAI Small | 1536 | 500 vectors 3MB |
5K vectors 29MB |
25K vectors 147MB |
100K vectors 586MB |
| OpenAI Large | 3072 | 200 vectors 2MB |
2.5K vectors 29MB |
5K vectors 59MB |
25K vectors 293MB |
Benchmarks performed on Apple M2 hardware.
Installation
uv add numpy-vector-store
Quick Start
import numpy as np
from numpy_vector_store import VectorStore
store = VectorStore[dict[str, str]](dimensions=3)
store.add(
vectors=np.array([
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0],
]),
metadata=[
{"title": "x-axis"},
{"title": "y-axis"},
{"title": "z-axis"},
],
)
hits = store.cosine_search(
query=np.array([0.9, 0.1, 0.0]),
top_k=2,
)
for hit in hits:
print(f"{hit.metadata['title']}: {hit.value:.3f}")
metadata is an opaque row payload returned with hits. It can be a dict,
dataclass, string, integer row ID, or any other Python object that fits your
application.
Normalization
VectorStore defaults to normalize=True, which scales each stored vector to
length 1. Normalization preserves vector direction while discarding magnitude:
[3.0, 4.0] -> [0.6, 0.8]
This is the default because it makes cosine similarity fast and direction-only,
which is the common case for semantic embeddings. Use normalize=False when
vector length matters, such as when magnitude encodes strength, confidence,
counts, scale, or raw geometry.
Zero vectors are rejected in both modes because cosine similarity is undefined for zero-norm vectors.
| Method | normalize=True default |
normalize=False |
|---|---|---|
cosine_search |
True cosine similarity over stored unit vectors; fastest/default path for embeddings | True cosine similarity over raw vectors; computes vector norms during search |
dot_search |
Dot product of unit vectors, effectively equivalent to cosine similarity | True dot product over original vectors; use when magnitude should affect ranking |
euclidean_search |
Distance between normalized directions; useful only when direction-normalized distance is intended | True Euclidean distance over original vectors; use for geometric/feature-space nearest neighbors |
get |
Returns normalized vectors | Returns original vectors |
save |
Saves normalized vectors | Saves raw vectors |
load |
Loads and normalizes vectors | Loads vectors exactly as stored |
Search Methods
Use cosine_search for semantic embeddings and direction-only similarity:
hits = store.cosine_search(query, top_k=10, min_value=0.75)
Use dot_search with normalize=False when larger-magnitude vectors should
rank higher:
store = VectorStore[dict[str, str]](dimensions=3, normalize=False)
store.add(vectors, metadata)
hits = store.dot_search(query, top_k=10, min_value=0.0)
Use euclidean_search with normalize=False for raw coordinate or feature-space
nearest-neighbor search:
store = VectorStore[dict[str, str]](dimensions=3, normalize=False)
store.add(vectors, metadata)
hits = store.euclidean_search(query, top_k=10, max_value=1.5)
Prefiltering
The store does not implement a metadata query language. To filter by metadata,
produce row indexes first, then pass them with within_rows.
rows = [
i
for i, metadata in enumerate(store.metadata)
if metadata["title"].startswith("x")
]
hits = store.cosine_search(query, top_k=10, within_rows=rows)
For structured NumPy metadata, use NumPy to produce the row indexes:
metadata_table = np.array(
[
("intro", "A", 2024),
("setup", "A", 2023),
("guide", "B", 2024),
],
dtype=[("title", "U20"), ("product", "U10"), ("year", "i4")],
)
store = VectorStore[int](dimensions=3)
store.add(vectors, metadata=np.arange(len(metadata_table)))
mask = (metadata_table["product"] == "A") & (metadata_table["year"] >= 2024)
rows = np.flatnonzero(mask)
hits = store.cosine_search(query, within_rows=rows)
for hit in hits:
row = metadata_table[hit.metadata]
print(row["title"], hit.value)
Persistence
Pass a file_path and call save() / load() explicitly:
store = VectorStore[dict[str, str]](dimensions=1536, file_path="vectors.npz")
store.add(embeddings, metadata)
store.save()
loaded = VectorStore[dict[str, str]](dimensions=1536, file_path="vectors.npz")
loaded.load()
If you save with normalize=False, load with normalize=False too:
store = VectorStore[dict[str, str]](
dimensions=1536,
file_path="raw-vectors.npz",
normalize=False,
)
store.add(raw_vectors, metadata)
store.save()
loaded = VectorStore[dict[str, str]](
dimensions=1536,
file_path="raw-vectors.npz",
normalize=False,
)
loaded.load()
Context manager usage auto-saves on exit:
with VectorStore[dict[str, str]](dimensions=1536, file_path="vectors.npz") as store:
store.add(embeddings, metadata)
Persistence uses a minimal NumPy .npz contract with vectors and metadata
arrays. The .npz file does not encode the normalize setting; choose the same
setting when loading that you used when saving. Loading validates shape,
dimensions, row counts, and zero-norm vectors. It also uses allow_pickle=True
for flexible Python metadata payloads, so only load files generated by your own
application or another trusted local process. Loading untrusted .npz files is
not a supported security model.
Compatibility
This project is still pre-1.0, so occasional breaking changes are expected while the API stabilizes. Breaking changes are documented in GitHub release notes. Deprecated APIs will keep warning for at least one point release before removal.
Contributing
git clone https://github.com/tvanreenen/numpy-vector-store.git
cd numpy-vector-store
uv sync --frozen --group dev
Before submitting a pull request:
- Run
uv run ruff check - Run
uv run ruff format --check - Run
uv run mypy src/ - Run
uv run pytest
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file numpy_vector_store-0.3.0.tar.gz.
File metadata
- Download URL: numpy_vector_store-0.3.0.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b5238eecd05858ca3678ebf86c533459cb8db86515393228f70633366748a61
|
|
| MD5 |
2564128afc95ff032f04de2ecacaa6c4
|
|
| BLAKE2b-256 |
db060b2088a6d1fb6461efb44a78b975988ec3940dd6e7e3f9f0c933f4caa733
|
Provenance
The following attestation bundles were made for numpy_vector_store-0.3.0.tar.gz:
Publisher:
publish-pypi.yml on tvanreenen/numpy-vector-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numpy_vector_store-0.3.0.tar.gz -
Subject digest:
4b5238eecd05858ca3678ebf86c533459cb8db86515393228f70633366748a61 - Sigstore transparency entry: 1541239681
- Sigstore integration time:
-
Permalink:
tvanreenen/numpy-vector-store@dfc7f3ae8d8662aa59bee48d30f7eac6418d88d9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/tvanreenen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@dfc7f3ae8d8662aa59bee48d30f7eac6418d88d9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file numpy_vector_store-0.3.0-py3-none-any.whl.
File metadata
- Download URL: numpy_vector_store-0.3.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd335cc3562785a9b7411001f4e261ef0d7456a3b006ae67b505cef5df33982f
|
|
| MD5 |
b30c34afdde9e30d50f1f94e64964942
|
|
| BLAKE2b-256 |
61d7f3a182e76de602e2637e906c06070b138373e7a0f57c0953f7d006952539
|
Provenance
The following attestation bundles were made for numpy_vector_store-0.3.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on tvanreenen/numpy-vector-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
numpy_vector_store-0.3.0-py3-none-any.whl -
Subject digest:
fd335cc3562785a9b7411001f4e261ef0d7456a3b006ae67b505cef5df33982f - Sigstore transparency entry: 1541239881
- Sigstore integration time:
-
Permalink:
tvanreenen/numpy-vector-store@dfc7f3ae8d8662aa59bee48d30f7eac6418d88d9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/tvanreenen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@dfc7f3ae8d8662aa59bee48d30f7eac6418d88d9 -
Trigger Event:
release
-
Statement type: