Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.

References:

  1. @inproceedings{avq_2020,
      title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
      author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
      booktitle={International Conference on Machine Learning},
      year={2020},
      URL={https://arxiv.org/abs/1908.10396}
    }
    
  2. @inproceedings{soar_2023,
      title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search},
      author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv},
      booktitle={Neural Information Processing Systems},
      year={2023},
      URL={https://arxiv.org/abs/2404.00774}
    }
    

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 17, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.17 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-17 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.3.3-cp312-cp312-manylinux_2_27_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.3.3-cp311-cp311-manylinux_2_27_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.3.3-cp310-cp310-manylinux_2_27_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.3.3-cp39-cp39-manylinux_2_27_x86_64.whl (10.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.3.3-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.3-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 fac7102636bc10211fc99b84ebd3640276427becd046ab8e85db6d9fec8c6563
MD5 fad14c604bbdecf589460fb316c243eb
BLAKE2b-256 fb117577ec3f177ae6fdffdb500524d557a06884fa86abfd1efc23802c78a4ff

See more details on using hashes here.

File details

Details for the file scann-1.3.3-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.3-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 5c618ed0e11d7d635a9d2c1749b89da08b5bcdfe19f7dbd640af200a231b8aea
MD5 73f15be78253bfabcf2bbe98cb45f4f9
BLAKE2b-256 c6e5e2621f5d038483e52aa32f9713ac2e78685d519feecb64f82319b3a15d9e

See more details on using hashes here.

File details

Details for the file scann-1.3.3-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.3-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 4600ad48586e3ee1028f2f904df3eee2d47a62659587c902fa2f3e8ad4aae477
MD5 fe8e98d31199db02bcd0c3b3c95bdbb5
BLAKE2b-256 178d6c8c638db0f050905e0e8a69adfb2c830bb7174e01e025f0aeab22e56c9e

See more details on using hashes here.

File details

Details for the file scann-1.3.3-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.3-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 f243b37ebbb6ab6cdbda4eb9f4649690e9c4e34f9db7e046bf8e0c82ab8fbf51
MD5 1f55f6c8bf8f1dece8c1c11fe01e2de2
BLAKE2b-256 2b3c3903c8b8d1a212ec008133ff5d57973d937241f21159f123fd7a64882ae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page