Skip to main content

Faster version of Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.

Project description

GitHub repository https://github.com/QunBB/fastannoy

FastAnnoy

This library is a pybind11 port of spotify/annoy.

Installation

To install, just do pip install fastannoy to pull down from PyPI.

Install from source code

  • clone this repository
  • pip install ./fastannoy

Backgroud

First of all, thanks for spotify/annoy's awesome work, it provides efficient implement for Approximate Nearest Neighbors Search. But when i find that batch search is missing, so this project's initial purpose is for batch search.

However, it's written in pybind11 for python interface, and discovered better performance.

Usage

All basic interfaces is same as spotify/annoy.

from fastannoy import AnnoyIndex
import random

f = 40  # Length of item vector that will be indexed

t = AnnoyIndex(f, 'angular')
for i in range(1000):
    v = [random.gauss(0, 1) for _ in range(f)]
    t.add_item(i, v)

t.build(10) # 10 trees
t.save('test.ann')

# ...

u = AnnoyIndex(f, 'angular')
u.load('test.ann') # super fast, will just mmap the file
print(u.get_nns_by_item(0, 100)) # will find the 100 nearest neighbors
"""
[0, 17, 389, 90, 363, 482, ...]
"""

print(u.get_nns_by_vector([random.gauss(0, 1) for _ in range(f)], 100)) # will find the 100 nearest neighbors by vector
"""
[378, 664, 296, 409, 14, 618]
"""

Batch Search

Corresponding to get_nns_by_item, the batch search version is get_batch_nns_by_items. The first argument should be a list of int.

In the same way, corresponding to get_nns_by_vector, the batch search version is get_batch_nns_by_vectors. The first argument should be a list of list[int].

And the batch search's implement supports multiple threads. You can set the argument n_threads, the default is 1.

# will find the 100 nearest neighbors

print(u.get_batch_nns_by_items([0, 1, 2], 100))
"""
[[0, 146, 858, 64, 833, 350, 70, ...], 
[1, 205, 48, 396, 382, 149, 305, 125, ...], 
[2, 898, 503, 618, 23, 959, 244, 10, 445, ...]]
"""

print(u.get_batch_nns_by_vectors([
    [random.gauss(0, 1) for _ in range(f)]
    for _ in range(3)
], 100))
"""
[[862, 604, 495, 638, 3, 246, 778, 486, ...], 
[260, 722, 215, 709, 49, 248, 539, 126, 8, ...], 
[288, 764, 965, 320, 631, 505, 350, 821, 540, ...]]
"""

Benchmark

The results are running in my macbook with the test script, so focus on time consumption relatively between fastannoy and annoy.

fastannoy annoy
50W items with 128 dimension
- build+add_item 13.810 seconds 19.633 seconds
- 5W times search 20.613 seconds 39.760 seconds
- 5k times search with 10 batch size and 5 threads 6.542 seconds /

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastannoy-1.1.1.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl (147.2 kB view details)

Uploaded CPython 3.9 macOS 10.12+ x86-64

File details

Details for the file fastannoy-1.1.1.tar.gz.

File metadata

  • Download URL: fastannoy-1.1.1.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for fastannoy-1.1.1.tar.gz
Algorithm Hash digest
SHA256 0848b21d697748cbac167103ac0d57985f370b4dbd6f2875c650f92502063f76
MD5 a90f368a1dd24f3dca398cf250d5e318
BLAKE2b-256 75dc6d538f5e9c7ef89a5ef43214971e8df97fba77f11c5980696f4092ebda8f

See more details on using hashes here.

File details

Details for the file fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for fastannoy-1.1.1-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 26f4dd3775a024bde8f4a3f49a45696871cd197e4615c54ec6d6f0a3ddc96f1c
MD5 cfb7528a3b33d313ef006f04f47bf23b
BLAKE2b-256 e74bf946fecd0cfc0137df171c0456a14f9cd33b57c0a17dcd030a969f06aa4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page