Skip to main content

Python bindings for Dynamic Learned Index - efficient approximate nearest neighbor search using learned models

Project description

py-dynamic-learned-index

Python bindings for Dynamic Learned Index - a high-performance approximate nearest neighbor search library that uses learned neural network models to efficiently index and query unstructured data.

Overview

The Dynamic Learned Index uses a multi-level hierarchical structure where each level contains a neural network model trained to predict the most likely bucket containing the requested data. This learned approach significantly improves search performance compared to traditional indexing methods.

Installation

Install from PyPI:

pip install py-dynamic-learned-index

Quick Start

from py_dynamic_learned_index import Index

# Create an index
index = Index(
    input_shape=768,           # Dimension of your vectors
    buffer_size=5000,          # Write buffer size
    bucket_size=5000,          # Max records per bucket
    arity=3,                   # Fanout for compaction
    device="cpu"               # Use "cpu" or "cuda"
)

# Insert vectors with IDs
vector = [0.1, 0.2, 0.3, ...]  # 768-dimensional vector
index.insert(vector, id=1)

# Search for k nearest neighbors
query = [0.15, 0.25, 0.35, ...]
results = index.search(query, k=10)  # Returns top-10 nearest neighbors

# Delete an entry
index.delete(id=1)

Bulk Insertion

For efficient insertion of large datasets, use insert_bulk() which creates all necessary index levels at once and trains the final level with your data. This is significantly faster than inserting vectors one-by-one.

Important: insert_bulk() can only be called on an empty index.

import numpy as np
from py_dynamic_learned_index import DynamicLearnedIndexBuilder

# Create an empty index
index = DynamicLearnedIndexBuilder(
    input_shape=768,
    buffer_size=5000,
    bucket_size=5000
).build()

# Prepare bulk data
n_vectors = 100000
vectors = np.random.randn(n_vectors, 768).astype(np.float32)
ids = np.arange(n_vectors, dtype=np.uint32)

# Insert all vectors at once
index.insert_bulk(vectors, ids)

# Now you can search
query = vectors[0]
results = index.search(query, k=10)

Configuration

The index supports various configuration options to optimize for your use case:

  • input_shape: Vector dimension (must match your data)
  • buffer_size: Size of write buffer before flushing to index
  • bucket_size: Maximum records per bucket before splitting
  • arity: Fanout for tree compaction
  • device: Compute device ("cpu" or "cuda")
  • distance_fn: Distance metric ("dot" for cosine similarity, "l2" for euclidean)

For more details on configuration, see the main project README.

Features

  • Learned Indexing: Uses neural networks to guide search, not just data distribution
  • Dynamic: Supports online insertions and deletions
  • Multi-level: Hierarchical structure for better scalability
  • Fast: Optimized Rust implementation with Python bindings
  • Flexible: Configurable models, distance functions, and compaction strategies

Examples

Full example with data loading and querying:

import numpy as np
from py_dynamic_learned_index import Index

# Create index for 768-dimensional vectors (e.g., embeddings)
index = Index(input_shape=768, buffer_size=5000, bucket_size=5000)

# Generate sample data
n_vectors = 100000
vectors = np.random.randn(n_vectors, 768).astype(np.float32)

# Insert vectors
for i, vector in enumerate(vectors):
    index.insert(vector, id=i)

# Search
query = vectors[0]  # Use first vector as query
neighbors = index.search(query, k=10)
print(f"Top 10 neighbors: {neighbors}")

For more examples, check the example.py file in the repository.

Performance

The Dynamic Learned Index is optimized for:

  • High-dimensional vector search (e.g., embeddings from language/vision models)
  • Large-scale datasets (millions of vectors)
  • Both batch and online query scenarios

Performance depends on your data distribution, vector dimensionality, and hardware configuration.

Heavy Users: Building from Source

Simple Development Build

For a quick development build:

git clone https://github.com/plhis/DynamicLearnedIndex.git
cd DynamicLearnedIndex/py_dynamic_learned_index
pip install maturin
maturin develop --release

Sdist Installation with Custom Features

For advanced users building from source distribution with specific features:

# Set up environment
export CARGO_TARGET_DIR=$SCRATCHDIR/cargo-target
mkdir -p $CARGO_TARGET_DIR

# Install Rust (if not already installed)
if ! command -v rustup &> /dev/null; then
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
    source $HOME/.cargo/env
fi

# Install and set Rust nightly
rustup install nightly
rustup default nightly

# Install build dependencies
pip install maturin meson-python meson ninja cython

# Set compiler flags (optional)
export CC=gcc
export CXX=g++
export UV_LINK_MODE=copy

# Build with custom features (e.g., measure_time, mix)
export MATURIN_BUILD_ARGS="--features measure_time,mix"

# Install from source distribution without binary wheels
pip install --force-reinstall --no-binary :all: py-dynamic-learned-index --no-build-isolation

# Verify installation
python -c "import py_dynamic_learned_index; print(py_dynamic_learned_index.__version__)"

Note: Set MATURIN_BUILD_ARGS with the features you need (e.g., measure_time, mix, tch, candle, mkl).

Requirements

  • Python >= 3.9
  • For GPU support (cuda feature): NVIDIA GPU and CUDA toolkit

Documentation

For detailed configuration and usage documentation, see the main project repository.

License

Licensed under the GNU Lesser General Public License v3.0 or later (LGPL-3.0-or-later).

See the LICENSE file in the main repository for full license text.

Contributing

Contributions are welcome! Please see the main repository for contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_dynamic_learned_index-0.1.13.tar.gz (123.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_aarch64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ ARM64

py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_10_12_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file py_dynamic_learned_index-0.1.13.tar.gz.

File metadata

File hashes

Hashes for py_dynamic_learned_index-0.1.13.tar.gz
Algorithm Hash digest
SHA256 8e5cae2a489f572fd15e696802a681d8e71b1af62e7e3fc7d5946d909d644d04
MD5 eef860f3a97f6630b767e053a1602c9b
BLAKE2b-256 ed068fab8cd0764cd78aadbe95058ee389f26ac990efdcb8c64f3df754bcf6f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_dynamic_learned_index-0.1.13.tar.gz:

Publisher: publish-pypi.yml on simonplhak/DynamicLearnedIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 6c399dd1a5288d604b1449768cb2eb9afe1ef322e11ecc807999acb065a9bc84
MD5 00d79ca7a0533cd32578ccf2cd2f2c9f
BLAKE2b-256 70c853f51d89940c7824353e9b5915d192af448bc832f958393ed393afbd85c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_x86_64.whl:

Publisher: publish-pypi.yml on simonplhak/DynamicLearnedIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_aarch64.whl.

File metadata

File hashes

Hashes for py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_aarch64.whl
Algorithm Hash digest
SHA256 22acc79a04e46bdf0643a34996df930733ff9d6b8f282e8556d80de4f82d35fe
MD5 63318c1011259837f1fd9c21b168f3b0
BLAKE2b-256 39aab9ba49e3e318fcaab61671bb4156ccb24fc0b70e8b133b76bcd3c1186b16

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_dynamic_learned_index-0.1.13-cp312-cp312-manylinux_2_39_aarch64.whl:

Publisher: publish-pypi.yml on simonplhak/DynamicLearnedIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 66b969487eaad57a8c0c88413f5f3e0498958a90ab9ebe2890a8ed3cb98e3196
MD5 b244730e9e92914e538ddc6cc858d22c
BLAKE2b-256 7db1461376360e2f59c9fc139e68dc41e2de83bb95127cddeed7918d10052113

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on simonplhak/DynamicLearnedIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 316c926de48590c3de6ba46d8cde43603dfbd29dc9529c4b36458673824ed1f5
MD5 43ea69cbd7231c8b75e27396865b6d36
BLAKE2b-256 fb7b4562b7751c50acd1c5d63e403f2dce9c716bc90a0ce27044bce3867107b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_dynamic_learned_index-0.1.13-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish-pypi.yml on simonplhak/DynamicLearnedIndex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page