Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is optimized for x86 processors with AVX support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.

References:

  1. @inproceedings{avq_2020,
      title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
      author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
      booktitle={International Conference on Machine Learning},
      year={2020},
      URL={https://arxiv.org/abs/1908.10396}
    }
    
  2. @inproceedings{soar_2023,
      title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search},
      author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv},
      booktitle={Neural Information Processing Systems},
      year={2023},
      URL={https://arxiv.org/abs/2404.00774}
    }
    

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.13. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI. The x86 wheels require AVX and FMA instruction set support, while the ARM wheels require NEON.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Using ScaNN with TensorFlow

ScaNN has optional TensorFlow op bindings that allow ScaNN's nearest neighbor search functionality to be embedded into a TensorFlow SavedModel. As of ScaNN 1.4.0, this functionality is no longer enabled by default, and pip install scann[tf] is now needed to enable the TensorFlow integration.

If scann[tf] is installed, the TensorFlow ops can be accessed via scann.scann_ops, and the API is almost identical to scann.scann_ops_pybind. For users not already in the TensorFlow ecosystem, the native Python bindings in scann.scann_ops_pybind are a better fit, which is why the TensorFlow ops are an optional extra.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Building from source

To build ScaNN from source, first install the build tool bazel (use version 7.x), Clang 19, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.20 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-19 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

To build an ARM binary from an ARM machine, the prerequisites are the same, but the compile flags are slightly modified:

python configure.py
CC=clang-19 bazel build -c opt --features=thin_lto --copt=-march=armv8-a+simd --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scann-1.4.2-cp313-cp313-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64

scann-1.4.2-cp313-cp313-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64

scann-1.4.2-cp312-cp312-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64

scann-1.4.2-cp312-cp312-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64

scann-1.4.2-cp311-cp311-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64

scann-1.4.2-cp311-cp311-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64

scann-1.4.2-cp310-cp310-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64

scann-1.4.2-cp310-cp310-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ ARM64

scann-1.4.2-cp39-cp39-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64

scann-1.4.2-cp39-cp39-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ ARM64

File details

Details for the file scann-1.4.2-cp313-cp313-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp313-cp313-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 c87e97f91c98d7d1f0bf985b39634e07e1149ba79c20f7dbf9b7b465c94100f2
MD5 354ab66522e93fddb3d93aac65218c85
BLAKE2b-256 3bb43decfd7039399b6bd9c9fbf0ccda39301bac01c39a09a5a791c8237f5d26

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp313-cp313-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp313-cp313-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 469a4a907ea09a15e4c25850330377bd9967c29177d836600051caf281e33612
MD5 416088954042e101321a57f4c9e3416f
BLAKE2b-256 89a1b868924221e7ec8a8c2e7b74ad9fc730fe5554962dc0d56efc2615d03ea3

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 c0f45d810e629fdcdf8da546d1e25bde4664d27dcf5c3399aa335b4934a0734c
MD5 e001ec978340c2ee0f1dba583a936bc2
BLAKE2b-256 c0f86ecec0c496a1f771c6874dbb00f619ab4ac1e434fa7661e4eedc4e959623

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp312-cp312-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp312-cp312-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 3042b599d1ada6c48650663d208359b5ccfcbcbfa4fa75bafb81402e6c87bfa4
MD5 2dbc0fa31d3aee117a4c515ef27c2fe3
BLAKE2b-256 85498fd3392977d9aae2277b851190dfcdacb62528bfdd337ee7a24bd820bd0f

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 5157287c0e5156bda5850be4257b466120f426f969b8329a69cb5adf74d54e52
MD5 c618114b284b60b0a9c96b46867b56d3
BLAKE2b-256 9d6a481749080d7c97a4d76b9e7ce4b5bdca79b7e8faccdb1ce45f74e25c7c3f

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp311-cp311-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp311-cp311-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 d83ea277712a509c63a709ef2aaab0861c17490c745ffd781fc18dcbd93b0978
MD5 e51da9c5f386a07f153b33fa5b926f6e
BLAKE2b-256 873acdda06cb9c5a4513839649e7d977bfb7fc33621534d702b8bfa62357c588

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 bfc8c661d1a99456b2c841335eaf1bf8e4d04c61c34eac41ab5805fbaf0e3da5
MD5 446227892b5f067c0ea2294f8e4387a9
BLAKE2b-256 374cdb206f9ec19057e3c837226bd8d7de3f2f5c3c630ccfd962f00c5a2c86ca

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp310-cp310-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp310-cp310-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 1e781c280a2537cd7d252842d9c3e8a4360db36c356cf62ddec0e4d5f0a766de
MD5 acf85a56e080d44f1279c47e9f78683f
BLAKE2b-256 6b3501895f20ec61d40ec6ee842965da022e35766fd9acbff669a7723027e46e

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 b34f64f7ed9349cad2d5be9dd90f3b203278fc94d2920bc51e95895ae50b22b0
MD5 85fac8a46240b4c11e3792aae4c35bc4
BLAKE2b-256 4aa14ac32f25394da7e07b9a8dd419dd65a244d7bb3772e00d9d6d2f8a2a4726

See more details on using hashes here.

File details

Details for the file scann-1.4.2-cp39-cp39-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.4.2-cp39-cp39-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 7de7e076be04d7c1df19dcbc73770ac25646988aa8a2bdf388c7168c5e13b02e
MD5 5c0128c4644dd5167f0326e557833e52
BLAKE2b-256 7c81f5a43f03c9c1dc67e2282faba98ecf243d3ec99bf691640a18de9c86f18a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page