Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code implements [1, 2], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1, 2]. The code is released for research purposes. For more details on the academic description of algorithms, please see below.

References:

  1. @inproceedings{avq_2020,
      title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
      author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
      booktitle={International Conference on Machine Learning},
      year={2020},
      URL={https://arxiv.org/abs/1908.10396}
    }
    
  2. @inproceedings{soar_2023,
      title={SOAR: Improved Indexing for Approximate Nearest Neighbor Search},
      author={Sun, Philip and Simcha, David and Dopson, Dave and Guo, Ruiqi and Kumar, Sanjiv},
      booktitle={Neural Information Processing Systems},
      year={2023},
      URL={https://arxiv.org/abs/2404.00774}
    }
    

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

TensorFlow dependency

ScaNN has an optional dependency on TensorFlow, but for historical reasons, is registered as having a hard dependency on TensorFlow. You can use pip install --no-deps scann to avoid installing the large TensorFlow dependency if you're not interested in using ScaNN's TensorFlow bindings (scann.scann_ops). (For users not already in the TensorFlow ecosystem, the native Python bindings in scann.scann_ops_pybind are a better fit.)

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel (use version 7.x), Clang 17, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.17 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-17 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.3.5-cp312-cp312-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.3.5-cp312-cp312-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ ARM64

scann-1.3.5-cp311-cp311-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.3.5-cp311-cp311-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ ARM64

scann-1.3.5-cp310-cp310-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.3.5-cp310-cp310-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ ARM64

scann-1.3.5-cp39-cp39-manylinux_2_27_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

scann-1.3.5-cp39-cp39-manylinux_2_27_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ ARM64

File details

Details for the file scann-1.3.5-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 7f24d1ee8d8b83a4508e5bd0c52a168338b23c83508ce66672277f6131f5c015
MD5 c3b6ebca5610300eebdddd858c8cc31b
BLAKE2b-256 3cb3bf4129274fd878b51da18c9a6b6e0ac1e53b539c9bdc7e42eac32734a960

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp312-cp312-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp312-cp312-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 75c0e485cc7a4f3bf530a8e968dde0260f619b1ddd1fec1bdb0ac93ad2f1f504
MD5 a4c810ecff1b9be223f7058c1f64948f
BLAKE2b-256 663fd1c1413626c4cce99ad2980c4dc2228ef8dc20521c1f6fd1c796898a5c78

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 44610fc072c069a997ed845ac67ce7d0f6d04ab785fbffe09053eb55b0fd01f4
MD5 b274fced061a7a9cb4dc9a4a65975530
BLAKE2b-256 99e12fd42733ca9728f4423234b170866ea341a060c0580563d7997ee39703a9

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp311-cp311-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp311-cp311-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 8deb8a781ebf3ee8c77d00ab4d8100a151d3e6f495c75c99a0ece18a18308472
MD5 e0c74d4320737440d9eab1143a781567
BLAKE2b-256 0a43eccea0b3f28bf08e32701a6697b37c1a5724c4a7b85e55a57abe6b481b12

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 ee6e250bc9c6cff707bf10271c14945a5b165ea7b4bb88a0c044da48252f5688
MD5 795800146e11f087214de6beedb72a52
BLAKE2b-256 a6041913f2db3aa1071770f614011485c0af6642e3e792ca2383199acca36ea0

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp310-cp310-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp310-cp310-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 d3784846788a40865cfbcc1d44cb724bb364fae70941a089b1b71c40cddc29d6
MD5 9eb6e636ab540fc7160e451c588da26c
BLAKE2b-256 c34dde89a710ab2720da154de2c3f7af0138a1f55c09b7720f948b1c49e973f8

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 5c33af9703651be81d3f7c60bc55a8f7196cbfe68e861c34f9581d92b679e75e
MD5 2acd16f05ada2d4e017ea4b602bd0d80
BLAKE2b-256 cbde3db3676a800dc1f5e88a4597bf2a87c78a62e5e1eaae0b10f84f59c273fa

See more details on using hashes here.

File details

Details for the file scann-1.3.5-cp39-cp39-manylinux_2_27_aarch64.whl.

File metadata

File hashes

Hashes for scann-1.3.5-cp39-cp39-manylinux_2_27_aarch64.whl
Algorithm Hash digest
SHA256 894afb14db529ada339261dc3bc170e7506dc29c8fb9c528927ffd8c59a778be
MD5 e59b86682f97bfb9f0922d349cbebb08
BLAKE2b-256 b9ddeb6fb2c5b3d95c8a468efdf43aedf8ba255fb3142ff197868198d41e4264

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page