Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 16, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.16 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-16 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.3.2-cp312-cp312-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.3.2-cp311-cp311-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.3.2-cp310-cp310-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.3.2-cp39-cp39-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.3.2-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.2-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 fa12bcb392b8066b83b00526a1a247e28011921e34484f9bc854a8ee93c87672
MD5 a978a7535ee1351c8a8c01557545ba39
BLAKE2b-256 300b53fb162ef5137be343e52aa80d3095a46d072f629ce510be224872236a0a

See more details on using hashes here.

File details

Details for the file scann-1.3.2-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.2-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 ccbb3618d12308d4d0feb42b593425ab97d790d18798f84f0a5383a05a57c5b6
MD5 ae406db7612312b7d6707154e7f2e625
BLAKE2b-256 fcd036d364f77b2d2e0642875b491408ad72c0b9809e623153aff48ffebab816

See more details on using hashes here.

File details

Details for the file scann-1.3.2-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.2-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 36756afc0a8782a40dba23a59968e145abe6c0d770daba36367cf712a02fda9b
MD5 452f34d10a70ce5a3cdc764e570c2754
BLAKE2b-256 772090cad3b63da8052cbebd0b61fb30ea94bf0f14edbd811d580daa1c0250c8

See more details on using hashes here.

File details

Details for the file scann-1.3.2-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.2-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 aa225587e262aa3c1779d2778d6327192330920ba244c9cc5e93eb766c85f281
MD5 bc791bf7142f8927a06e42aaba5d0294
BLAKE2b-256 7d2b524709d25dc16eae3f9feb480822361f5bacb53f22c486888fc9360460ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page