Composable vector store with pluggable persistence, dtype quantisation, and similarity search

Project description

tqvs

Composable vector store with pluggable persistence, dtype quantization, and similarity search.

tqvs gives you a keyed vector store that works entirely in-process — no server, no network calls. Vectors are quantized on the fly to the storage dtype you choose, persisted through a swappable backend, and queried with brute-force similarity search (with optional CUDA acceleration via PyTorch).

Installation

pip install tqvs              # core (NumPy only, includes the npy backend)
pip install tqvs[hdf5]        # + HDF5 backend  (h5py)
pip install tqvs[lmdb]        # + LMDB backend  (lmdb)
pip install tqvs[lance]       # + Lance backend (pylance)
pip install tqvs[parquet]     # + Parquet backend (pyarrow)
pip install tqvs[torch]       # + GPU-accelerated metrics
pip install tqvs[all]         # all optional backends

Requires Python 3.12+.

Quick start

import numpy as np
from tqvs import create_vector_store, StoreDtype

store = create_vector_store("./my_store", dim=768, dtype=StoreDtype.INT8_SYM)

store.add("doc/1", np.random.rand(768).astype(np.float32), metadata={"title": "Hello"})
store.add("doc/2", np.random.rand(768).astype(np.float32))

results = store.query(np.random.rand(768).astype(np.float32), k=5)
for r in results:
    print(r.key, r.score)

store.save()

Fluent builder

from tqvs import VectorStoreBuilder, StoreDtype, Hdf5Backend, LoadMode

store = (
    VectorStoreBuilder()
    .at("./my_store")
    .with_dim(768)
    .with_dtype(StoreDtype.FLOAT16)
    .with_backend(Hdf5Backend())
    .with_load_mode(LoadMode.MMAP)
    .build()
)

Batch insert

keys = [f"vec/{i}" for i in range(10_000)]
vectors = np.random.rand(10_000, 768).astype(np.float32)
store.add_many(keys, vectors)

Prefix-filtered queries

# Search only keys starting with "doc/"
results = store.query(q, k=10, prefix="doc/")

# Iterate matching keys
for key in store.keys(prefix="img/"):
    print(key)

Storage dtypes

Vectors are quantized at insert time and dequantized at query time. Smaller dtypes reduce memory and disk usage at the cost of some precision.

Dtype	Bits/value	Description
`FLOAT64`	64	Full double precision
`FLOAT32`	32	Single precision (default)
`FLOAT16`	16	IEEE half precision
`BFLOAT16`	16	Brain floating point
`INT8_SYM`	8	Symmetric linear quantization
`INT8_ASYM`	8	Asymmetric linear quantization
`INT4`	4	4-bit packed (2 per byte)
`INT3`	3	3-bit packed (8 per 3 bytes)
`TURBO_2BIT`	2	TurboQuant with random rotation
`TURBO_3BIT`	3	TurboQuant with random rotation
`TURBO_4BIT`	4	TurboQuant with random rotation

TurboQuant dtypes apply a random rotation before quantization to spread quantization error more evenly across dimensions.

Persistence backends

Five backends are available. They all share the same in-memory store — the backend only determines how data is serialized to and from disk.

Backend	Install extra	Dependency	Format on disk
`NpyBackend`	(none)	NumPy (core)	`.npy` files + JSON manifest
`Hdf5Backend`	`hdf5`	h5py	Single `.h5` file
`LmdbBackend`	`lmdb`	lmdb	LMDB database directory
`LanceBackend`	`lance`	pylance	Lance columnar dataset
`ParquetBackend`	`parquet`	pyarrow	`.parquet` files

Performance comparison

Benchmarked with 1,000,000 128-dimensional vectors, averaged across all 11 storage dtypes. In-memory operations (add, query, key lookup) are backend-independent since the backend is only involved during save/reload.

Save throughput (vectors/sec, 1M vectors)

Rank	Backend	Avg vec/s	Relative
1	HDF5	7,840,000	1.0×
2	LMDB	5,980,000	0.76×
3	Lance	5,530,000	0.71×
4	NumPy	4,530,000	0.58×
5	Parquet	715,000	0.09×

Reload throughput (vectors/sec, 1M vectors)

Rank	Backend	Avg vec/s	Relative
1	NumPy	1,860,000	1.0×
2	HDF5	1,700,000	0.91×
3	LMDB	1,630,000	0.88×
4	Lance	1,490,000	0.80×
5	Parquet	523,000	0.28×

In-memory operations (all backends equivalent)

Operation	Avg vec/s
Key prefix scan	~70,000,000
Prefix query (brute-force)	~4,000,000

Choosing a backend

HDF5 — Best all-round choice. Fastest saves, near-fastest reloads. Single .h5 file is easy to manage. Good default for most workloads.
NumPy (npy) — Zero optional dependencies. Fastest reloads, solid saves. Choose this if you want to avoid extra packages.
LMDB — Comparable I/O to HDF5/NumPy, and supports transactional semantics. Useful when concurrent read access matters.
Lance — Slightly slower reloads than the top tier but still competitive. Best suited when you already use the Lance ecosystem and want a unified format.
Parquet — Roughly 10× slower saves and 3–4× slower reloads than HDF5. Choose only when interop with Spark, DuckDB, Pandas, or other columnar tools is a priority.

Dtype impact on I/O

Smaller quantized dtypes save and reload faster because there is less data to write and read:

Dtype	HDF5 save (s)	HDF5 reload (s)
`TURBO_2BIT`	0.08	0.59
`INT3`	0.08	0.55
`TURBO_3BIT`	0.08	0.54
`INT4`	0.10	0.67
`TURBO_4BIT`	0.09	0.52
`INT8_SYM`	0.14	0.60
`INT8_ASYM`	0.14	0.65
`BFLOAT16`	0.21	0.54
`FLOAT16`	0.22	0.55
`FLOAT32`	0.39	0.62
`FLOAT64`	0.73	0.70

(1M vectors, dim=128)

Disk usage

On-disk size depends primarily on the storage dtype and secondarily on the backend format. NpyBackend and Hdf5Backend are the most space-efficient, storing close to the raw vector data with minimal overhead. LMDB adds ~2× overhead due to its B-tree page structure. Lance and Parquet add modest columnar metadata.

(100K vectors, dim=128)

Dtype	Npy / HDF5	Lance	Parquet	LMDB
`FLOAT64`	1,036 B/vec (98.8 MB)	1,048 B/vec	817 B/vec	2,177 B/vec
`FLOAT32`	524 B/vec (49.9 MB)	536 B/vec	547 B/vec	1,153 B/vec
`FLOAT16`	268 B/vec (25.6 MB)	280 B/vec	269 B/vec	641 B/vec
`BFLOAT16`	268 B/vec (25.6 MB)	280 B/vec	232 B/vec	641 B/vec
`INT8_SYM`	144 B/vec (13.7 MB)	156 B/vec	163 B/vec	393 B/vec
`INT8_ASYM`	148 B/vec (14.1 MB)	160 B/vec	165 B/vec	401 B/vec
`INT4`	80 B/vec (7.6 MB)	92 B/vec	98 B/vec	265 B/vec
`INT3`	64 B/vec (6.1 MB)	76 B/vec	82 B/vec	233 B/vec
`TURBO_2BIT`	48 B/vec (4.6 MB)	60 B/vec	66 B/vec	201 B/vec
`TURBO_3BIT`	64 B/vec (6.1 MB)	76 B/vec	82 B/vec	233 B/vec
`TURBO_4BIT`	80 B/vec (7.6 MB)	92 B/vec	98 B/vec	265 B/vec

Using TURBO_2BIT instead of FLOAT32 reduces on-disk size by ~11× (50 MB → 4.6 MB per 100K vectors). LMDB is consistently ~2× larger than the other backends due to its B-tree page overhead, but this is the cost of its transactional and concurrent-read capabilities.

TurboQuant performance overhead

TurboQuant dtypes apply a random orthogonal rotation before scalar quantization, which spreads quantization error evenly but adds computational cost. The overhead is most visible during queries, where dequantization sits in the hot loop.

Insert overhead (add_many, 1M vectors)

TurboQuant inserts are only marginally slower than standard quantization at the same bit-width — the rotation matrix multiply is amortized across the batch:

Dtype	Avg vec/s	vs standard equivalent
`TURBO_2BIT`	~66K	0.97× vs `INT3`
`TURBO_3BIT`	~64K	0.90× vs `INT4`
`TURBO_4BIT`	~63K	0.89× vs `INT8_SYM`

Query overhead (query_prefix, 1M vectors)

Queries are significantly slower because each scored vector requires unpacking, codebook lookup, and inverse rotation:

Dtype	Avg vec/s	vs `FLOAT32`	vs standard equivalent
`FLOAT32`	7.74M	1.0×	—
`INT8_SYM`	5.35M	0.69×	—
`INT4`	4.10M	0.53×	—
`INT3`	2.86M	0.37×	—
`TURBO_2BIT`	2.32M	0.30×	0.81× vs `INT3`
`TURBO_3BIT`	1.29M	0.17×	0.31× vs `INT4`
`TURBO_4BIT`	933K	0.12×	0.17× vs `INT8_SYM`

TURBO_2BIT queries at ~0.30× FLOAT32 speed — reasonable given the 11× storage reduction. TURBO_3BIT and TURBO_4BIT are notably slower because the sub-byte unpacking for 3- and 4-bit codes is more complex. If query speed matters more than maximum compression, prefer INT4 or INT8_SYM over the higher-bit TurboQuant variants.

Similarity metrics

Three built-in metrics are provided:

cosine_similarity — default, higher = more similar
dot_product — raw dot product
euclidean_distance — negative L2 distance (higher = closer)

from tqvs import create_vector_store, euclidean_distance

store = create_vector_store("./store", dim=128, metric=euclidean_distance)

GPU acceleration with PyTorch

Install the torch extra to unlock CUDA-accelerated similarity scoring:

pip install tqvs[torch]

When device is set, query() and score() automatically dispatch to PyTorch equivalents of the three built-in metrics. Vectors are transferred to the target device, scored using torch tensor ops, and results are returned as NumPy arrays.

store = create_vector_store("./store", dim=768, device="cuda")

# query() and score() now run on GPU
results = store.query(q, k=10)

This is most beneficial when:

You use cosine_similarity (the default) — the normalization step adds enough compute to offset the CPU→GPU→CPU data transfer overhead. For dot_product and euclidean_distance, NumPy is already fast enough that the transfer cost makes CUDA slower in practice.
Vector count is large (10k+) — the cost of host-to-device transfer is amortized over a larger scoring pass.
Dimensionality is high (768, 1024, 1536) — GPU parallelism shines on wide matrix operations.

CUDA speedup (cosine_similarity, RTX 5090)

Vectors	dim=128	dim=768	dim=1536
1,000	0.2× (slower)	1.9×	2.5×
10,000	1.7×	4.0×	4.4×
100,000	2.7×	4.5×	4.5×
1,000,000	3.2×	4.9×	OOM

For dot_product and euclidean_distance, CUDA is 0.1–0.9× (slower than NumPy) across all tested configurations because the operations are dominated by a single matrix-vector product that NumPy (MKL/OpenBLAS) handles efficiently without data transfer overhead.

Recommendation: Use device="cuda" with the default cosine_similarity metric when you have 10k+ vectors at dim ≥ 768. For other metrics or smaller stores, leave device unset.

For small stores (under ~10k vectors) or low dimensions, NumPy on CPU is already fast enough and the data-transfer overhead makes GPU dispatch slower. The device parameter also accepts "cpu" to use PyTorch's CPU kernels (useful for MKL-backed PyTorch builds), though in most cases plain NumPy is comparable.

Note: PyTorch acceleration applies only to the scoring step of query() and score(). Insert, delete, save, reload, and key operations are always NumPy-based and unaffected by the device setting.

Load modes

Mode	Behavior
`EAGER`	Load all data into memory on open (default)
`LAZY`	Defer loading until first access
`MMAP`	Memory-map the vector array from disk

License

See pyproject.toml for package metadata.

Project details

Release history Release notifications | RSS feed

0.1.2

Apr 3, 2026

This version

0.1.1

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tqvs-0.1.1.tar.gz (41.3 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tqvs-0.1.1-py3-none-any.whl (33.9 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file tqvs-0.1.1.tar.gz.

File metadata

Download URL: tqvs-0.1.1.tar.gz
Upload date: Apr 3, 2026
Size: 41.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tqvs-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6621065ae652a2d8e4f77636f3d3befe41a19f3cb98c58c3c4b0f22af21074b4`
MD5	`7db921656786acc2ddc7fc74734e03fe`
BLAKE2b-256	`c08415d4a424fe03c0aee970d2fa9548e5829c11a4eeac277671e615f46f26e0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tqvs-0.1.1.tar.gz:

Publisher: publish.yml on ruckc/tqvs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tqvs-0.1.1.tar.gz
- Subject digest: 6621065ae652a2d8e4f77636f3d3befe41a19f3cb98c58c3c4b0f22af21074b4
- Sigstore transparency entry: 1227964159
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: ruckc/tqvs@cce2968b25350cec2bf9fa881c52b269a49e1823
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ruckc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cce2968b25350cec2bf9fa881c52b269a49e1823
- Trigger Event: push

File details

Details for the file tqvs-0.1.1-py3-none-any.whl.

File metadata

Download URL: tqvs-0.1.1-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 33.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tqvs-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba81099047c8da3127b2e2d024438d0d84d42724ad7efd7bfac03f630d628293`
MD5	`749bdf5da98eb02fb31a4b8f25146c5f`
BLAKE2b-256	`835190eb14cee4636977629ebc3746d4b06d2733bcd66e16e613da871de07982`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tqvs-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ruckc/tqvs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tqvs-0.1.1-py3-none-any.whl
- Subject digest: ba81099047c8da3127b2e2d024438d0d84d42724ad7efd7bfac03f630d628293
- Sigstore transparency entry: 1227964181
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: ruckc/tqvs@cce2968b25350cec2bf9fa881c52b269a49e1823
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ruckc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cce2968b25350cec2bf9fa881c52b269a49e1823
- Trigger Event: push

tqvs 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

tqvs

Installation

Quick start

Fluent builder

Batch insert

Prefix-filtered queries

Storage dtypes

Persistence backends

Performance comparison

Save throughput (vectors/sec, 1M vectors)

Reload throughput (vectors/sec, 1M vectors)

In-memory operations (all backends equivalent)

Choosing a backend

Dtype impact on I/O

Disk usage

TurboQuant performance overhead

Insert overhead (add_many, 1M vectors)

Query overhead (query_prefix, 1M vectors)

Similarity metrics

GPU acceleration with PyTorch

CUDA speedup (cosine_similarity, RTX 5090)

Load modes

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance