Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.

References:

  1. @inproceedings{avq_2020,
      title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
      author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
      booktitle={International Conference on Machine Learning},
      year={2020},
      URL={https://arxiv.org/abs/1908.10396}
    }
    
  2. @inproceedings{soar_2023,
      title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search},
      author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv},
      booktitle={Neural Information Processing Systems},
      year={2023},
      URL={https://arxiv.org/abs/2404.00774}
    }
    

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel (use version 7.x), Clang 17, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.17 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-17 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.3.4-cp312-cp312-manylinux_2_27_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.3.4-cp311-cp311-manylinux_2_27_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.3.4-cp310-cp310-manylinux_2_27_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.3.4-cp39-cp39-manylinux_2_27_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.3.4-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.4-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 6004c5d02f044ff40946e8c10878df5b5d981dcd4f0239084440ca1da680b4d3
MD5 f10fbec69904a87a68220101aed84f41
BLAKE2b-256 7c99c78203733141e916df1802c47e62617913cff5f57cb691c90ce8e9520031

See more details on using hashes here.

File details

Details for the file scann-1.3.4-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.4-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 279906134989ff86dc431eb9f57c5a473ffd4ac01d49dab5876e8f8edde0c4b0
MD5 f692f54111af8db5bee8dff6919c0872
BLAKE2b-256 31a00dd27277b10b9537ec149626252549d789622c75d9051a65d46aac11bde3

See more details on using hashes here.

File details

Details for the file scann-1.3.4-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.4-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 3b794a92820286e9ec2615e98ca83c5f43773cc1528c8df843ae350c015e7fbc
MD5 8f1e9d5ff091bb1c73fc66bc223956bd
BLAKE2b-256 d9e5d9bb7d34f824ded668ded87c9c95dedfa922eab90451e2f86b6f3b121bd4

See more details on using hashes here.

File details

Details for the file scann-1.3.4-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.4-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 197963037dfbeecf6dba5c40e724ef333db7ab33672d6e134c20142f2b8a2a0a
MD5 1c1e396712f2e8a1c886d27aefbaa61e
BLAKE2b-256 464719cbee9ad6d538d4071d905a2aaf335775e0754ed4c4acd4f0fe02c12c71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page