Skip to main content

HNSW Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage

Project description

hannoy 🗼

License Crates.io dependency status Build CodSpeed Badge

hannoy is a key-value backed HNSW implementation based on arroy.

Motivation

Many popular HNSW libraries are built in memory, meaning you need enough RAM to store all the vectors you're indexing. Instead, hannoy uses LMDB — a memory-mapped KV store — as a storage backend. This is more well-suited for machines running multiple programs, or cases where the dataset you're indexing won't fit in memory. LMDB also supports non-blocking concurrent reads by design, meaning its safe to query the index in multi-threaded environments.

Features

  • Supported metrics: euclidean, cosine, manhattan, hamming, as well as quantized counterparts.
  • Python bindings with maturin and pyo3
  • Multithreaded builds using rayon
  • Disk-backed storage to enable indexing datasets that won't fit in RAM using LMDB
  • Compressed bitmaps to store graph edges with minimal overhead, adding ~200 bytes per vector
  • Dynamic document insertions and deletions without full re-indexing

Missing Features

  • GPU-accelerated indexing

Usage

Rust 🦀

use hannoy::{distances::Cosine, Database, Reader, Result, Writer};
use heed::EnvOpenOptions;
use rand::{rngs::StdRng, SeedableRng};

fn main() -> Result<()> {
    let env = unsafe {
        EnvOpenOptions::new()
            .map_size(1024 * 1024 * 1024) // 1GiB
            .open("./")
    }
    .unwrap();

    let mut wtxn = env.write_txn()?;
    let db: Database<Cosine> = env.create_database(&mut wtxn, None)?;
    let writer: Writer<Cosine> = Writer::new(db, 0, 3);

    // build
    writer.add_item(&mut wtxn, 0, &[1.0, 0.0, 0.0])?;
    writer.add_item(&mut wtxn, 0, &[0.0, 1.0, 0.0])?;

    let mut rng = StdRng::seed_from_u64(42);
    let mut builder = writer.builder(&mut rng);
    builder.ef_construction(100).build::<16,32>(&mut wtxn)?;
    wtxn.commit()?;

    // search
    let rtxn = env.read_txn()?;
    let reader = Reader::<Cosine>::open(&rtxn, 0, db)?;

    let query = vec![0.0, 1.0, 0.0];
    let nns = reader.nns(1).ef_search(10).by_vector(&rtxn, &query)?.into_nns();

    dbg!("{:?}", &nns);
    Ok(())
}

Python 🐍

import hannoy
from hannoy import Metric
import tempfile

tmp_dir = tempfile.gettempdir()
db = hannoy.Database(tmp_dir, Metric.COSINE)

with db.writer(3, m=4, ef=10) as writer:
    writer.add_item(0, [1.0, 0.0, 0.0])
    writer.add_item(1, [0.0, 1.0, 0.0])

reader = db.reader()
nns = reader.by_vec([0.0, 1.0, 0.0], n=2)

(closest, dist) = nns[0]

Tips and tricks

Reducing cold start latencies

Search in an hnsw always traverses from the top to bottom layers of the graph, so we know a priori some vectors will be needed. We can hint to the kernel that these vectors (and their neighbours) should be loaded into RAM using madvise to speed up search.

Doing so can reduce cold-start latencies by several milliseconds, and is configured through the HANNOY_READER_PREFETCH_MEMORY environment variable.

E.g. prefetching 10MiB of vectors into RAM.

export HANNOY_READER_PREFETCH_MEMORY=10485760

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hannoy-0.1.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hannoy-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hannoy-0.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hannoy-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (991.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

hannoy-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hannoy-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hannoy-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (992.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hannoy-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hannoy-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

hannoy-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (996.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

hannoy-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

hannoy-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

hannoy-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (998.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file hannoy-0.1.0.tar.gz.

File metadata

  • Download URL: hannoy-0.1.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for hannoy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c73d12f08b814f0c99ae9e1e325d7e96f68f166de010c9c6c3f9aa54135786f4
MD5 4f664701007ba550ff707f20cc3a27c6
BLAKE2b-256 77b8761848f6dce5d36879c240fc7a906d092cece56f69705fbb1603aadb72d9

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 588d9d9ba9e088a0a924c7f71439c426b40070d878c54ce655da814c43f2bfae
MD5 2fc385643caa4052b5ed2ab8bf2a70f5
BLAKE2b-256 b8a9aa06c62684c7db58ae4fbdf36a7a2bb99d24efd1fb2bb3c7da154627622f

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0425eab9939e1095511ca1c7d459d195decdee02fb389c26bf0dd2597ee585b8
MD5 66b6deff71e749e1b64ea7fdf94942f4
BLAKE2b-256 6a2e304a42342884c93cab9b88cb0ad0c24c1235cce02d3368200535cb397e92

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ead0d075331ad69fab49b50597abcbffac8265a1467442274368b03c087ce435
MD5 0fadbc4f593d0421d1ce7d8be5ef174c
BLAKE2b-256 4daaab38ccae5f6346ea091b16a7cda654f28b282ccfce468cf25fec8f9df4aa

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 67cee076e7558a0c8aad2cdcd2db90b1b3cbe83906b7cb4528c0adb956f4539a
MD5 cc4c5d404dfaf9654232f3e9d450deae
BLAKE2b-256 f8a1f3ab53f1289cd4fa4a451e062cd2232f7e3b571d90c7c922d3be43307371

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a0b68b4d8d79754495524095efa070775235b0aadbc4553ee48c9038d2f5e183
MD5 76eeeb2a4b07295447561c3b2537ca87
BLAKE2b-256 3aea7095cc7ed83c4be94740c3e60f34e6f2a712b1b88f1bea9c5694ce65e9fb

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4a852be11af7be0c75616c931b2d34286675b78510cb3d300c2c4746f1ba0153
MD5 a8de984d8ba857ac64257fd5edfd01df
BLAKE2b-256 2174ca0201412241ffdedddf4b59def631ff23e6bc0fc36d9db8b0399c085301

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 df45bf07b1f03e604f4892ef913b6aee4717a37efc253b2504e7f4b7481fa38e
MD5 4379490b256f5e82efa1f33f8cd69319
BLAKE2b-256 f3fe5e0a5522fb24a6a01b91e8de4749928117d29d561dd0346863b65b3dac9c

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 90f3185b0f0d8bdaf6fd8c8d7af94613e858118d953f3a933b4c29d6e20357e9
MD5 61259b1a31aa3a4f5324563ca29d69c5
BLAKE2b-256 fbe0c67a9739b477bb97f25c42eadb09909001dbc372209f95722728a1250b8e

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9e17cfdba5d55b1c7fb9e5dcb376264d06463a372974ed65d72e8169af5793cc
MD5 02cce08088f02de022ce6fe0fd59146b
BLAKE2b-256 475cc71f7e14ec1329c8402397a883c2e2c29543a5e55c9426a1be6df8fca4e1

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 132dc0ed6ee0f977dc544848316a1d21527842f13e0e2db0d45ef72785236fd1
MD5 ec766cb06a1029e62eaa285545735c6d
BLAKE2b-256 21b6c50c6c85923a77831944676a35d2f7cc1b6f8780ab4dc679b79e4032f5d6

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e31391924562256874d20c02e9bce734ebdc58e8f47410156ed2998f4d3a2e81
MD5 07d6a7951f94e949d00ba8151ade099e
BLAKE2b-256 b27152a6845f0e89e6cfbb3517fdb62c565e2545e7d12f5cfa425881d6dc7f79

See more details on using hashes here.

File details

Details for the file hannoy-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hannoy-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cbb85530dc6052d9378762404edf2315fd3b2c8d7769c9efaa086873d17a5463
MD5 296b6f728b1bd8ade2991729382185eb
BLAKE2b-256 1e4029ae570e2fe952b063b7d60699b92f9d73118cf7ada11cd6643f11586452

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page