Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.8-3.11. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.8.x or later) and Tensorflow 2.13 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.10-cp311-cp311-manylinux_2_27_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.2.10-cp310-cp310-manylinux_2_27_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.2.10-cp39-cp39-manylinux_2_27_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

scann-1.2.10-cp38-cp38-manylinux_2_27_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.2.10-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.2.10-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 1cfe020dcd750018bf2e255b1b41caa838e96e6b2fafd1ea75f59f817f8fcdac
MD5 bcd2e3d0e00f382ae95705d2c182d408
BLAKE2b-256 5f9bf66f8a10fb09ba7d07918583dec9b65bd52557bb11737c83272c6e7acc4c

See more details on using hashes here.

File details

Details for the file scann-1.2.10-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.2.10-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 fc68bc7891d4d1ea70958706b79981dd6f07fcfd6b24294c9016e0fa09823310
MD5 62468615f0b111ccad2f5bb75f803902
BLAKE2b-256 99c897b369f46bcca719ed883be85e1026a6d971df57644e59ed151c1456ef25

See more details on using hashes here.

File details

Details for the file scann-1.2.10-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.2.10-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 e7c163b9bed65a203f69b9c7e458b26be045b9137a608cc470839544b1d523f2
MD5 67a9365747c3ab25374691f3b72a50bf
BLAKE2b-256 1973d54a010c7d2c4bb334449d33a6ae2619249ebf7741cbf2997cb1947e92f6

See more details on using hashes here.

File details

Details for the file scann-1.2.10-cp38-cp38-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.2.10-cp38-cp38-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 bcda67f03f32024681d7779c318d08a5cc1bcec9e6ae187f99d4dbcdf99db4d2
MD5 8114687eb53591706002c21f6fb421d2
BLAKE2b-256 c7cf5a5e6bf549288ab01c31e25c6eaf0cccc6d8be5b0316e79a82b40a95a6d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page