A lightweight, pure Python implementation of HNSW algorithm

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

HNSW-Lite

A lightweight, pure Python implementation of Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor search.

Overview

HNSW-Lite provides an efficient solution for approximate nearest neighbor search in high-dimensional spaces. It's designed to be lightweight, easy to use, and suitable for a wide range of applications including:

Similarity search in vector databases
Recommendation systems
Image similarity
Document retrieval
Feature matching

The implementation follows the algorithm described in "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" by Malkov & Yashunin (2016).

Key Features

Pure Python Implementation: No C/C++ dependencies required
Multiple Distance Metrics: Supports cosine and euclidean distance
Simple API: Easy to integrate into your projects
Metadata Support: Attach arbitrary metadata to vectors
Flexible Configuration: Tune parameters for speed/accuracy tradeoffs

Installation

pip install hnsw-lite

Quick Start

import numpy as np
from hnsw.hnsw import HNSW

# Initialize HNSW index
hnsw = HNSW(space="cosine", M=16, ef_construction=200)

# Sample vectors to index
vectors = [
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
    [0.5, 0.5, 0.0],
    [0.3, 0.3, 0.3]
]

# Add vectors to the index with optional metadata
for i, vector in enumerate(vectors):
    hnsw.insert(vector, {"id": i, "description": f"Vector {i}"})

# Create a query vector
from hnsw.node import Node
query_vector = [0.3, 0.3, 0.3]
query_node = Node(query_vector, 0)

# Search for nearest neighbors
k = 3  # Number of nearest neighbors to retrieve
results = hnsw.knn_search(query_node, k)

# Process results
for i, (distance, node) in enumerate(results):
    print(f"Neighbor {i+1}:")
    print(f"  Distance: {-distance}")  # Distances are negated in results
    print(f"  Vector: {node.vector}")
    print(f"  Metadata: {node.metadata}")

API Reference

HNSW Class

The main class for creating and searching the HNSW index.

HNSW(space="cosine", M=16, ef_construction=200)

Parameters

space (str): Distance metric to use. Options: "cosine", "euclidean". Default: "cosine"
M (int): Maximum number of connections per node (except for layer 0). Default: 16
ef_construction (int): Size of the dynamic list for the nearest neighbors during construction. Higher values lead to more accurate graphs but slower construction. Default: 200

Methods

insert(vector, metadata={}): Insert a vector into the index with optional metadata
knn_search(query_node, top_n): Find k nearest neighbors for a query

Node Class

Used to represent vectors in the HNSW graph.

Node(vector, level, metadata=None)

Parameters

vector (List[float]): The vector to be stored in the node
level (int): The level of the node in the HNSW hierarchy
metadata (Dict): Optional metadata to associate with the node

Performance Benchmarks

Effect of M Parameter on Recall and Performance

The M parameter (maximum number of connections per node) is crucial for balancing search accuracy and performance. Our benchmarks show:

Recall vs M Value

Recall by M Value

As shown in the graph:

M=5: 86.91% recall
M=10: 96.34% recall
M=15: 98.29% recall
M=20: 98.94% recall
M=30: 99.26% recall
M=40: 99.36% recall

The biggest improvements occur when increasing M from 5 to 15. After M=20, the recall gains diminish significantly while computational costs continue to increase.

Query Processing Time vs M Value

Processing Time by M Value

Processing time increases with higher M values:

M=5: 0.457 ms
M=10: 0.644 ms
M=15: 0.787 ms
M=20: 0.939 ms
M=30: 1.130 ms
M=40: 1.334 ms

Comparison with Other Algorithms

SIFT Query Performance Comparison

When comparing HNSW-Lite with other algorithms (FAISS, MRPT) on the SIFT dataset:

HNSW consistently delivers sub-2ms query times regardless of dataset size
HNSW shows faster query times compared to traditional methods
Performance remains stable across different dataset sizes, from 50,000 to 500,000 records

Recommended M Values for Different Use Cases

Use Case	Recommended M	Rationale
Ultra-fast search with acceptable recall	8-12	Good balance for speed-critical applications
Balanced performance	15-20	Best trade-off between recall and speed for most applications
High recall priority	25-35	When search quality is more important than speed
Maximum recall	40+	When search accuracy is critical, regardless of speed

Parameter Tuning

Tuning the M Parameter

The HNSW algorithm's performance can be significantly impacted by the choice of M value:

Lower M values (5-12):
- ✅ Faster index construction
- ✅ Lower memory usage
- ✅ Faster queries
- ❌ Lower recall/accuracy
Higher M values (20-40):
- ✅ Better recall/accuracy (diminishing returns after M=20)
- ❌ Slower index construction
- ❌ Higher memory usage
- ❌ Slower queries

Suggested approach for tuning M:

Start with M=10 (good default value)
If search quality is insufficient, increase to M=20 or M=30
If search is too slow, decrease to M=8 or M=6
Run benchmarks with your specific dataset to find the optimal value

Tuning ef_construction

Lower ef_construction values (50-100):
- ✅ Faster index construction
- ❌ Potentially lower quality graph
Higher ef_construction values (200-500):
- ✅ Better graph quality
- ❌ Slower index construction

For most use cases, ef_construction=200 provides a good balance.

Performance Considerations

Memory Usage: Each vector requires storage for its coordinates, connections, and metadata
Construction Time: Scales roughly as O(n log n), where n is the number of vectors
Search Time: Scales approximately logarithmically with the number of vectors
Optimal M Value: The best M value depends on your specific use case, dataset dimensionality, and performance requirements

Limitations

This is a pure Python implementation optimized for ease of use and understanding rather than maximum performance
For extremely large datasets (millions of vectors), you might want to consider C++ implementations like hnswlib

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use HNSW-Lite in your research, please cite the original HNSW paper:

@article{malkov2016efficient,
  title={Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs},
  author={Malkov, Yury A and Yashunin, Dmitry A},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  volume={42},
  number={4},
  pages={824--836},
  year={2018},
  publisher={IEEE}
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.3

Apr 21, 2025

This version

0.1.2

Apr 21, 2025

0.1.1

Apr 21, 2025

0.1.0

Apr 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hnsw_lite-0.1.2.tar.gz (10.9 kB view details)

Uploaded Apr 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hnsw_lite-0.1.2-py3-none-any.whl (13.1 kB view details)

Uploaded Apr 21, 2025 Python 3

File details

Details for the file hnsw_lite-0.1.2.tar.gz.

File metadata

Download URL: hnsw_lite-0.1.2.tar.gz
Upload date: Apr 21, 2025
Size: 10.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for hnsw_lite-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`c63dd65d50a743a1dec26005a1555b658b650984f116dcfc9707e95ca4d91463`
MD5	`d06cee43d5b58033f96989c349575b8f`
BLAKE2b-256	`d309b6606a159aa5401a0c4992dfec3e18dafbc8cf9f159908ebf818963b07fe`

See more details on using hashes here.

File details

Details for the file hnsw_lite-0.1.2-py3-none-any.whl.

File metadata

Download URL: hnsw_lite-0.1.2-py3-none-any.whl
Upload date: Apr 21, 2025
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for hnsw_lite-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f63bdb5f50377d3e6301a0003cd6b18fd3c85d7e652fa2d4c625d7e0bc2832ca`
MD5	`43f55b746f1fc71a97d662ba3bcf53f7`
BLAKE2b-256	`61b7fc88e77f0c7e51c3dba68f5b755790f07135a612015e46a29583d045d88b`

See more details on using hashes here.

hnsw-lite 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HNSW-Lite

Overview

Key Features

Installation

Quick Start

API Reference

HNSW Class

Parameters

Methods

Node Class

Parameters

Performance Benchmarks

Effect of M Parameter on Recall and Performance

Recall vs M Value

Query Processing Time vs M Value

Comparison with Other Algorithms

Recommended M Values for Different Use Cases

Parameter Tuning

Tuning the M Parameter

Tuning ef_construction

Performance Considerations

Limitations

Contributing

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes