Scalable Nearest Neighbor search library
Project description
ScaNN
ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is optimized for x86 processors with AVX support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:
ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.
References:
-
@inproceedings{avq_2020, title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization}, author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv}, booktitle={International Conference on Machine Learning}, year={2020}, URL={https://arxiv.org/abs/1908.10396} }
-
@inproceedings{soar_2023, title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search}, author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv}, booktitle={Neural Information Processing Systems}, year={2023}, URL={https://arxiv.org/abs/2404.00774} }
Installation
manylinux_2_27
-compatible wheels are available on
PyPI:
pip install scann
ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI. The x86 wheels require AVX and FMA instruction set support, while the ARM wheels require NEON.
In accordance with the
manylinux_2_27
specification, ScaNN
requires libstdc++
version 3.4.23 or above from the operating system. See
here for an example of how
to find your system's libstdc++
version; it can generally be upgraded by
installing a newer version of g++
.
Using ScaNN with TensorFlow
ScaNN has optional TensorFlow op bindings that allow ScaNN's nearest neighbor
search functionality to be embedded into a TensorFlow SavedModel. As of
ScaNN 1.4.0, this functionality is no longer enabled by default, and pip install scann[tf]
is now needed to enable the TensorFlow integration.
If scann[tf]
is installed, the TensorFlow ops can be accessed via
scann.scann_ops
, and the API is almost identical to scann.scann_ops_pybind
.
For users not already in the TensorFlow ecosystem, the native Python bindings
in scann.scann_ops_pybind
are a better fit, which is why the TensorFlow ops
are an optional extra.
Integration with TensorFlow Serving
We provide custom Docker images of
TF Serving that are linked to the ScaNN
TF ops. See the tf_serving
directory for further
information.
Usage
See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.
Building from source
To build ScaNN from source, first install the build tool bazel (use version 7.x), Clang 18, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.19 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:
python configure.py
CC=clang-18 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg
To build an ARM binary from an ARM machine, the prerequisites are the same, but the compile flags are slightly modified:
python configure.py
CC=clang-18 bazel build -c opt --features=thin_lto --copt=-march=armv8-a+simd --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg
A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file scann-1.4.0-cp312-cp312-manylinux_2_27_x86_64.whl
.
File metadata
- Download URL: scann-1.4.0-cp312-cp312-manylinux_2_27_x86_64.whl
- Upload date:
- Size: 11.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.27+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 674d4946c1fd657998cb6c95a37b337f0b48e5d72f28fdff3d0f01a2de028080 |
|
MD5 | dedfa74c462497ef2a925a0bb318fcc3 |
|
BLAKE2b-256 | 96b524b82f84cd772aab41283278690c9ea15c2c96cead1f50ce683d676ad850 |
File details
Details for the file scann-1.4.0-cp312-cp312-manylinux_2_27_aarch64.whl
.
File metadata
- Download URL: scann-1.4.0-cp312-cp312-manylinux_2_27_aarch64.whl
- Upload date:
- Size: 9.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.27+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1e7c845d96ed11095bc4fdbb54449f44057de6cb5438b7df7ed2ff36c9ef957 |
|
MD5 | 93d50fc6a7595b5555200f73227385d7 |
|
BLAKE2b-256 | eb893a66a11750d5ad6382dfe5ac15d5b2f0a5e3bb4bd0976974277083f3a2d4 |
File details
Details for the file scann-1.4.0-cp311-cp311-manylinux_2_27_x86_64.whl
.
File metadata
- Download URL: scann-1.4.0-cp311-cp311-manylinux_2_27_x86_64.whl
- Upload date:
- Size: 11.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24b4d7ce4f056a4ac2cabe026662ce02795263ab3457d98703a40d9d08e63807 |
|
MD5 | dce237ee4566638a8fa6fe740fa50603 |
|
BLAKE2b-256 | 0d77cdaefa00942fd24c1fd7915617d9117cece6ee0f18cdad44506f215f5aab |
File details
Details for the file scann-1.4.0-cp311-cp311-manylinux_2_27_aarch64.whl
.
File metadata
- Download URL: scann-1.4.0-cp311-cp311-manylinux_2_27_aarch64.whl
- Upload date:
- Size: 9.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.27+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2324bf37daef28bf611e1d309c76c7de224611db0b29abc2f50be79ed7b601b2 |
|
MD5 | 6966155e7bf65fc7448b933c72a66b05 |
|
BLAKE2b-256 | 1aeb50a8c25a72c4a583686b865c47e49b4780ae047d35cd767d8e1984822eb2 |
File details
Details for the file scann-1.4.0-cp310-cp310-manylinux_2_27_x86_64.whl
.
File metadata
- Download URL: scann-1.4.0-cp310-cp310-manylinux_2_27_x86_64.whl
- Upload date:
- Size: 11.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c80630ba8628d6e151571de779f60a3089df02576acac7d20c6f1d381433b7f4 |
|
MD5 | e26210042614d58121c9a09e4b59c626 |
|
BLAKE2b-256 | 8133e31bd4a7523c483ca4768e8a3e62ba68d4e4d0e14c74eafaeae70a1e517e |
File details
Details for the file scann-1.4.0-cp310-cp310-manylinux_2_27_aarch64.whl
.
File metadata
- Download URL: scann-1.4.0-cp310-cp310-manylinux_2_27_aarch64.whl
- Upload date:
- Size: 9.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.27+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b5a4a47abcf861f9bd7354ece8ab317bf63afe107f9c79bcac5df07fed8f7f0 |
|
MD5 | 0bc9aeafc0676d2477fa36c354063f8c |
|
BLAKE2b-256 | 92bb55fac6a9eb29cd6ced9861130cfcaf9174409c781ede7da975ce03755abe |
File details
Details for the file scann-1.4.0-cp39-cp39-manylinux_2_27_x86_64.whl
.
File metadata
- Download URL: scann-1.4.0-cp39-cp39-manylinux_2_27_x86_64.whl
- Upload date:
- Size: 11.8 MB
- Tags: CPython 3.9, manylinux: glibc 2.27+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86b36eecfe9d109ee09066ba6969e51589585056e6f5066e63af78e255367e98 |
|
MD5 | 8f2f9efb765fae7ed18afaf68fc0a492 |
|
BLAKE2b-256 | ca0e11665f1e047302b081d80714202b41b62cb13913da18b823788c8f487c03 |
File details
Details for the file scann-1.4.0-cp39-cp39-manylinux_2_27_aarch64.whl
.
File metadata
- Download URL: scann-1.4.0-cp39-cp39-manylinux_2_27_aarch64.whl
- Upload date:
- Size: 9.5 MB
- Tags: CPython 3.9, manylinux: glibc 2.27+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5caef92a28e34d8104fda0d291e922b3bdc272c225345d5b617f3215775a7ce2 |
|
MD5 | d61029e58e64ee2ad6db329afa008a32 |
|
BLAKE2b-256 | 8afefaf3a6ae6f37f74dbde7d4e102c8388cdc713874ca18703cd606db97acce |