Python SDK for Lumina vector search engine
Project description
lumina-data
Python SDK for the Lumina vector search engine. Provides zero-overhead ctypes bindings to the Lumina C++ library for building and searching vector indexes (DiskANN, Bruteforce, IVF).
Requirements
- Linux x86_64
- Python >= 3.6
Install
pip install .
Pre-built native libraries are bundled in the package. No compilation needed.
Usage
High-level API (list in, list out)
from lumina_data import LuminaBuilder, LuminaSearcher
options = {
"index.type": "diskann",
"index.dimension": "128",
"distance.metric": "l2",
"encoding.type": "rawf32",
}
# Build
n, dim = 10000, 128
vectors = [...] # list of n*dim floats
ids = list(range(n))
builder = LuminaBuilder(options)
builder.pretrain_from_list(vectors, n, dim)
builder.insert_from_list(vectors, ids, n, dim)
builder.dump("/path/to/index.lmi")
builder.close()
# Search
searcher = LuminaSearcher(options)
searcher.open("/path/to/index.lmi")
query = [0.1, 0.2, ...] # list of dim floats
distances, labels = searcher.search_list(query, n=1, k=10)
for i in range(len(labels)):
print("id=%d distance=%.4f" % (labels[i], distances[i]))
searcher.close()
Raw ctypes API (zero-copy, for performance-critical code)
import ctypes
from lumina_data import LuminaBuilder, LuminaSearcher
options = {
"index.type": "diskann",
"index.dimension": "128",
"distance.metric": "l2",
"encoding.type": "rawf32",
}
n, dim, k = 10000, 128, 10
# Build
vectors = (ctypes.c_float * (n * dim))(*data)
ids = (ctypes.c_uint64 * n)(*range(n))
with LuminaBuilder(options) as builder:
builder.pretrain(vectors, n, dim)
builder.insert(vectors, ids, n, dim)
builder.dump("/path/to/index.lmi")
# Search
with LuminaSearcher(options) as searcher:
searcher.open("/path/to/index.lmi")
query = (ctypes.c_float * dim)(*query_data)
distances = (ctypes.c_float * k)()
labels = (ctypes.c_uint64 * k)()
searcher.search(query, 1, k, distances, labels,
{"diskann.search.list_size": "32"})
for i in range(k):
print("id=%d distance=%.4f" % (labels[i], distances[i]))
Filtered Search
# High-level
distances, labels = searcher.search_with_filter_list(
query, n=1, k=10, filter_ids=[0, 2, 4, 6, 8])
# Raw ctypes
filter_arr = (ctypes.c_uint64 * 5)(0, 2, 4, 6, 8)
searcher.search_with_filter(
query_arr, 1, k, filter_arr, 5, distances, labels)
Batch Queries
# High-level
all_queries = [...] # list of n_queries * dim floats
distances, labels = searcher.search_list(all_queries, n=5, k=10)
# Raw ctypes
queries = (ctypes.c_float * (5 * dim))(*data)
distances = (ctypes.c_float * (5 * k))()
labels = (ctypes.c_uint64 * (5 * k))()
searcher.search(queries, 5, k, distances, labels)
Metadata
from lumina_data import LuminaIndexMeta
# Serialize (compatible with paimon-lumina Java and paimon-cpp)
meta = LuminaIndexMeta({
"index.dimension": "128",
"distance.metric": "l2",
"index.type": "diskann",
"encoding.type": "rawf32",
})
data = meta.serialize() # -> bytes (JSON)
# Deserialize
meta = LuminaIndexMeta.deserialize(data)
print(meta.dim, meta.metric) # 128, MetricType.L2
API Reference
LuminaBuilder
| Method | Input | Description |
|---|---|---|
__init__(options) |
dict |
Create builder with native Lumina options. |
pretrain(vectors, n, dim) |
ctypes arrays | Pretrain with n vectors. |
insert(vectors, ids, n, dim) |
ctypes arrays | Insert vectors with IDs. |
pretrain_from_list(vectors, n, dim) |
Python lists | High-level pretrain. |
insert_from_list(vectors, ids, n, dim) |
Python lists | High-level insert. |
dump(path) |
str |
Write index to file. |
close() |
Release native resources. Supports with. |
LuminaSearcher
| Method | Input/Output | Description |
|---|---|---|
__init__(options) |
dict |
Create searcher. |
open(path) |
str |
Load index from file. |
search(q, n, k, dist, labels, opts) |
ctypes in/out | Raw search. |
search_with_filter(q, n, k, fids, fc, dist, labels, opts) |
ctypes in/out | Raw filtered search. |
search_list(q, n, k, opts) |
list in, list out | High-level search. |
search_with_filter_list(q, n, k, fids, opts) |
list in, list out | High-level filtered search. |
get_count() |
Number of vectors in index. | |
get_dimension() |
Vector dimension. | |
close() |
Release native resources. Supports with. |
Index Options
| Key | Values | Default |
|---|---|---|
index.type |
bruteforce, diskann, ivf |
diskann |
index.dimension |
integer | 128 |
distance.metric |
l2, cosine, inner_product |
inner_product |
encoding.type |
rawf32, sq8, pq |
pq |
diskann.build.ef_construction |
integer | 1024 |
diskann.build.neighbor_count |
integer | 64 |
diskann.build.thread_count |
integer | 32 |
diskann.search.list_size |
integer | auto (1.5x top_k) |
diskann.search.beam_width |
integer | 4 |
Performance
Query latency compared to native C++ (DiskANN, 100K vectors, dim=128, top-10):
| Avg Latency | Throughput | vs C++ | |
|---|---|---|---|
| C++ native | 0.367 ms | 2724 qps | baseline |
| Raw ctypes | 0.370 ms | 2705 qps | +0.8% |
| High-level API | 0.494 ms | 2024 qps | +34% |
Raw ctypes adds < 1% overhead. High-level API overhead comes from
list -> ctypes conversion per call.
Packaging & Publishing
Build wheel
pip install wheel setuptools
python setup.py bdist_wheel
Upload to PyPI
pip install twine
# Rename for PyPI (requires manylinux tag)
cd dist
mv lumina_data-0.1.0-*.whl lumina_data-0.1.0-cp36-cp36m-manylinux1_x86_64.whl
# Upload
twine upload dist/*.whl
Install from PyPI
pip install lumina-data
Tests
python3 tests/test_lumina_index.py
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lumina_data-0.1.0.dev3-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: lumina_data-0.1.0.dev3-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 14.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
269a259fbadd20a22676aedf498d61c5d0ef01d45a3f9b47f8b86d86174a1b0d
|
|
| MD5 |
dce1563443b66c17e612816c5d1ec94e
|
|
| BLAKE2b-256 |
9ef19068836a2936999930e0bdd52840fbc7d266d9b74a654b52eb54fde8da8d
|