Faster version of Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.
Project description
GitHub repository https://github.com/QunBB/fastannoy
FastAnnoy
This library is a pybind11 port of spotify/annoy.
Installation
To install, just do pip install fastannoy
to pull down from PyPI.
Install from source code
- clone this repository
pip install ./fastannoy
Backgroud
First of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.
However, it's written in pybind11 for python interface, and discovered better performance.
Usage
All basic interfaces is same as spotify/annoy.
from fastannoy import AnnoyIndex
import random
f = 40 # Length of item vector that will be indexed
t = AnnoyIndex(f, 'angular')
for i in range(1000):
v = [random.gauss(0, 1) for _ in range(f)]
t.add_item(i, v)
t.build(10) # 10 trees
t.save('test.ann')
# ...
u = AnnoyIndex(f, 'angular')
u.load('test.ann') # super fast, will just mmap the file
print(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors
"""
[0, 17, 389, 90, 363, 482, ...]
"""
print(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector
"""
[378, 664, 296, 409, 14, 618]
"""
Batch Search
Corresponding to get_nns_by_item
, the batch search version is get_batch_nns_by_items
. The first argument should be a list of int.
In the same way, corresponding to get_nns_by_vector
, the batch search version is get_batch_nns_by_vectors
. The first argument should be a list of list[int].
And the batch search's implement supports multiple threads. You can set the argument n_threads
, the default is 1.
# will find the 100 nearest neighbors
print(u.get_batch_nns_by_items([0, 1, 2], 100))
"""
[[0, 146, 858, 64, 833, 350, 70, ...],
[1, 205, 48, 396, 382, 149, 305, 125, ...],
[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]
"""
print(u.get_batch_nns_by_vectors([
[random.gauss(0, 1) for _ in range(f)]
for _ in range(3)
], 100))
"""
[[862, 604, 495, 638, 3, 246, 778, 486, ...],
[260, 722, 215, 709, 49, 248, 539, 126, 8, ...],
[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]
"""
Benchmark
The results are running in my macbook with the test script, so focus on time consumption relatively between fastannoy and annoy.
fastannoy | annoy | |
---|---|---|
50W items with 128 dimension | ||
- build+add_item | 13.810 seconds | 19.633 seconds |
- 5W times search | 20.613 seconds | 39.760 seconds |
- 5k times search with 10 batch size and 5 threads | 6.542 seconds | / |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fastannoy-1.1.1.tar.gz
.
File metadata
- Download URL: fastannoy-1.1.1.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0848b21d697748cbac167103ac0d57985f370b4dbd6f2875c650f92502063f76 |
|
MD5 | a90f368a1dd24f3dca398cf250d5e318 |
|
BLAKE2b-256 | 75dc6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f |
File details
Details for the file fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl
- Upload date:
- Size: 147.2 kB
- Tags: CPython 3.9, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26f4dd3775a024bde8f4a3f49a45696871cd197e4615c54ec6d6f0a3ddc96f1c |
|
MD5 | cfb7528a3b33d313ef006f04f47bf23b |
|
BLAKE2b-256 | e74bf946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b |