ZenANN: High-performance ANN search with a prebuilt C++ .so module
Project description
ZenANN: A High-Performance Vector Similarity Search Library for Python Users
Basic Information
ZenANN is a high-performance approximate nearest neighbor (ANN) similarity search library designed to be user-friendly for Python developers. It provides multiple indexing methods, such as IVF (Inverted File Index), HNSW (Hierarchical Navigable Small World), and hybrid-index structures to balance between accuracy and speed. The computation kernel of ZenANN will be optimized for cache efficiency, SIMD acceleration, and algorithms enhancements beyond existing in-memory libraries.
Problem to Solve
Similarity search is a fundamental problem in many domains, including information retrieval, natural language processing, and so on. The key challenge is to efficiently find out the nearest neighbors of a query vector in high-dimensional space. However, as the data size and dimensionality grows, the performance of traditional brute-force search (eg. KD-tree) may suffers from Curse of Dimensionality.
To solve this problem, approximate nearest neighbor (ANN) search aims to retrieve near-optimal results while significantly reducing computation time. It trades off a small loss in accuracy for significant speed improvements, making them ideal for high-dimensional vector search applications.
Although existing in-memory implementations (eg. Faiss) are highly optimized, there are still areas for improvement:
- Improved index data layout for a better cache locality
- SIMD acceleration for a specific algorithm
- Enhancements on data structures / algorithms to better match hardware characteristics
Prospective Users
ZenANN is designed for developers and researchers working on large-scale similarity search applications, including:
- Machine learning engineers who use ANN search for embedding-based retrieval in NLP, computer vision, and recommendation systems.
- Software developers who build applications requiring fast vector search with a clear, user-friendly programming interface.
- Data scientists who perform large-scale similarity analysis on high-dimensional datasets.
System Architecture
ZenANN will be implemented in C++ for high performance and exposes an intuitive Python API using pybind11.
Index Hierarchy
There will be an abstract base index, which provides a unified interface for different index classes.
- Base Index Class
indexBase: Defines the common API for all indexing methods (eg.add(),search(),train())
- KD-tree Index Class
KDTreeIndex: To serve as a baseline for approximate search algorithms, KD-tree is used to perform exact search.
- IVF Index Class
IVFIndex: A cluster-based structure for large dataset
- HNSW Index Class
HNSWIndex: A graph-based structure for accurate and efficient ANN
Note: Actual implementation detail of HNSW may be built on Faiss's interface according to development progress
Processing Flow
- Initialize an index (e.g.,
indexBase,indexHNSW) - Build an index with
add()
- Add the given vector data to a specific index instance.
- Train index with
train()if needed(for IVF-based Index) - Optimize the index data layout with reorder_layout in Faiss submodule to improve cache locality.
- Perform a query on the specified index instance using
search(). - Return result set with top-k id & estimated distance for each query.
API Description
There is a simple python examples for understanding the API design
import zenann
# Initialize an index for ANN search
index = zenann.HNSWIndex(dim=128, M=16, efConstruction=200)
# Add vectors to the index and conduct training / reordering
index.add(data_vectors)
# Perform a search
results = index.search(query_vector, k=5, efSearch=100)
Engineering Infrastructure
Automatic Build System
- GNU make
Version Control
- Git
- Github
Testing Framework
- Python: pytest
Documentation
- Markdown
- Mermaid
Continuous Integration
- Github Actions
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zenann-0.1.0.tar.gz.
File metadata
- Download URL: zenann-0.1.0.tar.gz
- Upload date:
- Size: 176.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea141fb8430fe7343fed2155b4fc764bbadcbc9bb550d0083303b6cbb2b0098d
|
|
| MD5 |
9f94569c162d8c976b3def268e8cf59a
|
|
| BLAKE2b-256 |
f3a110051be46a08d20b38211ec91442e3aad80bf11ef7c20a7c6fa6dd0f3d2d
|
File details
Details for the file zenann-0.1.0-py3-none-any.whl.
File metadata
- Download URL: zenann-0.1.0-py3-none-any.whl
- Upload date:
- Size: 173.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
391d012c2161a4ad00483f44b4a105de8b73b8a0f500e4d3c5b33bae97d7a992
|
|
| MD5 |
4f2e29e5e4168ebb432026dd0ea3b5b5
|
|
| BLAKE2b-256 |
371bd83634b0ad08f8a945e0e3b3dd9c723f7cd40a96e83b4c5625b258e13f43
|