Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux2014-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.7-3.9. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux2014 specification, ScaNN requires libstdc++ version 3.4.19 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.7.x or later) and Tensorflow 2.7 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx2 --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.7-cp310-cp310-manylinux_2_27_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.2.7-cp39-cp39-manylinux_2_27_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

scann-1.2.7-cp38-cp38-manylinux_2_27_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.27+ x86-64

scann-1.2.7-cp37-cp37m-manylinux_2_27_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.2.7-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.7-cp310-cp310-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 11.2 MB
  • Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.7-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 3b99f09af3fb77426b92a2822d106398ae0f0cf784e9722a5ba710581c7ae846
MD5 af4dbd6243d2703294d37e0b25fb5ad8
BLAKE2b-256 eb5d6176fd5c17e81244392e60f979660f5d9dd8f1290226864ae99148f80e62

See more details on using hashes here.

File details

Details for the file scann-1.2.7-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.7-cp39-cp39-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 11.2 MB
  • Tags: CPython 3.9, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.7-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 b5ff8cfe1cf22901c951427deb79fbd9ccb41ce6edcf60dddb204c82d05f990d
MD5 cf440b416eb962f5eff067dea3415792
BLAKE2b-256 15c7f4b07595e0db19279fd0901805fab5bc9b457851adc47c933fc92e49e63e

See more details on using hashes here.

File details

Details for the file scann-1.2.7-cp38-cp38-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.7-cp38-cp38-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 11.2 MB
  • Tags: CPython 3.8, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.7-cp38-cp38-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 bc6abf165057debab29fbd50b4f8cd497e7856074cecbe102e689c694aaf6d94
MD5 24656c6f16f2a420b5c4a10b0b36bee6
BLAKE2b-256 10a49953973c23b691c2383ba39f82c6edd0461d3559b6095195dfe072ce559f

See more details on using hashes here.

File details

Details for the file scann-1.2.7-cp37-cp37m-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.7-cp37-cp37m-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 11.2 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.7-cp37-cp37m-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 998ccd7d7b68eb0c1fed2098b6596cca64e58b7252e4a6b6f007a8547c06a1a0
MD5 808baee263c87e839ce2cbd6ce48c62e
BLAKE2b-256 6475a46e788f7c3b962178e3d67021404b38855e4572c1deedf324bb8e1b0a93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page