Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is optimized for x86 processors with AVX support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.

References:

  1. @inproceedings{avq_2020,
      title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
      author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
      booktitle={International Conference on Machine Learning},
      year={2020},
      URL={https://arxiv.org/abs/1908.10396}
    }
    
  2. @inproceedings{soar_2023,
      title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search},
      author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv},
      booktitle={Neural Information Processing Systems},
      year={2023},
      URL={https://arxiv.org/abs/2404.00774}
    }
    

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI. The x86 wheels require AVX and FMA instruction set support, while the ARM wheels require NEON.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Using ScaNN with TensorFlow

ScaNN has optional TensorFlow op bindings that allow ScaNN's nearest neighbor search functionality to be embedded into a TensorFlow SavedModel. As of ScaNN 1.4.0, this functionality is no longer enabled by default, and pip install scann[tf] is now needed to enable the TensorFlow integration.

If scann[tf] is installed, the TensorFlow ops can be accessed via scann.scann_ops, and the API is almost identical to scann.scann_ops_pybind. For users not already in the TensorFlow ecosystem, the native Python bindings in scann.scann_ops_pybind are a better fit, which is why the TensorFlow ops are an optional extra.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Building from source

To build ScaNN from source, first install the build tool bazel (use version 7.x), Clang 18, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.19 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-18 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

To build an ARM binary from an ARM machine, the prerequisites are the same, but the compile flags are slightly modified:

python configure.py
CC=clang-18 bazel build -c opt --features=thin_lto --copt=-march=armv8-a+simd --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.4.0-cp312-cp312-manylinux_2_27_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.4.0-cp312-cp312-manylinux_2_27_aarch64.whl (9.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ ARM64

scann-1.4.0-cp311-cp311-manylinux_2_27_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.4.0-cp311-cp311-manylinux_2_27_aarch64.whl (9.5 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ ARM64

scann-1.4.0-cp310-cp310-manylinux_2_27_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.4.0-cp310-cp310-manylinux_2_27_aarch64.whl (9.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ ARM64

scann-1.4.0-cp39-cp39-manylinux_2_27_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

scann-1.4.0-cp39-cp39-manylinux_2_27_aarch64.whl (9.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ ARM64

File details

Details for the file scann-1.4.0-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 674d4946c1fd657998cb6c95a37b337f0b48e5d72f28fdff3d0f01a2de028080
MD5 dedfa74c462497ef2a925a0bb318fcc3
BLAKE2b-256 96b524b82f84cd772aab41283278690c9ea15c2c96cead1f50ce683d676ad850

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp312-cp312-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp312-cp312-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 a1e7c845d96ed11095bc4fdbb54449f44057de6cb5438b7df7ed2ff36c9ef957
MD5 93d50fc6a7595b5555200f73227385d7
BLAKE2b-256 eb893a66a11750d5ad6382dfe5ac15d5b2f0a5e3bb4bd0976974277083f3a2d4

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 24b4d7ce4f056a4ac2cabe026662ce02795263ab3457d98703a40d9d08e63807
MD5 dce237ee4566638a8fa6fe740fa50603
BLAKE2b-256 0d77cdaefa00942fd24c1fd7915617d9117cece6ee0f18cdad44506f215f5aab

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp311-cp311-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp311-cp311-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 2324bf37daef28bf611e1d309c76c7de224611db0b29abc2f50be79ed7b601b2
MD5 6966155e7bf65fc7448b933c72a66b05
BLAKE2b-256 1aeb50a8c25a72c4a583686b865c47e49b4780ae047d35cd767d8e1984822eb2

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 c80630ba8628d6e151571de779f60a3089df02576acac7d20c6f1d381433b7f4
MD5 e26210042614d58121c9a09e4b59c626
BLAKE2b-256 8133e31bd4a7523c483ca4768e8a3e62ba68d4e4d0e14c74eafaeae70a1e517e

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp310-cp310-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp310-cp310-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 0b5a4a47abcf861f9bd7354ece8ab317bf63afe107f9c79bcac5df07fed8f7f0
MD5 0bc9aeafc0676d2477fa36c354063f8c
BLAKE2b-256 92bb55fac6a9eb29cd6ced9861130cfcaf9174409c781ede7da975ce03755abe

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 86b36eecfe9d109ee09066ba6969e51589585056e6f5066e63af78e255367e98
MD5 8f2f9efb765fae7ed18afaf68fc0a492
BLAKE2b-256 ca0e11665f1e047302b081d80714202b41b62cb13913da18b823788c8f487c03

See more details on using hashes here.

File details

Details for the file scann-1.4.0-cp39-cp39-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.0-cp39-cp39-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 5caef92a28e34d8104fda0d291e922b3bdc272c225345d5b617f3215775a7ce2
MD5 d61029e58e64ee2ad6db329afa008a32
BLAKE2b-256 8afefaf3a6ae6f37f74dbde7d4e102c8388cdc713874ca18703cd606db97acce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page