Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux2014-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.6-3.9. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux2014 specification, ScaNN requires libstdc++ version 3.4.19 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.6.x or later) and Tensorflow 2.5 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.2-cp39-cp39-manylinux2014_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.9

scann-1.2.2-cp38-cp38-manylinux2014_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.8

scann-1.2.2-cp37-cp37m-manylinux2014_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.7m

scann-1.2.2-cp36-cp36m-manylinux2014_x86_64.whl (10.9 MB view details)

Uploaded CPython 3.6m

File details

Details for the file scann-1.2.2-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.2-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.2-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bc94fc286036701e2b1973a09917b8c5357e49d654c57158f8f8017aea02a0d4
MD5 6c82f4f01b78bde3d049b90e4c8032ec
BLAKE2b-256 1d4d78d46b4fcd4aa907566c1382ca1701dbf4673df70d8b5dc442c58de9ad54

See more details on using hashes here.

File details

Details for the file scann-1.2.2-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.2-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e5c2afcbc02944002dbd2c5bd27e8c64c6a47612f6796622500dddd507ddcfb6
MD5 d1408e849d2aee1834d0c9279678202c
BLAKE2b-256 95412da1fa825d88728a63cb3ef692c5fdabb01adcf9cb635d9fec82adc76f27

See more details on using hashes here.

File details

Details for the file scann-1.2.2-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.2-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e95f46bc8dd09d3557e166ebf9e1b031387d01b4718a7b8636fd3ca1d7c1429
MD5 6aa0e485be688443084c7653154f5ed6
BLAKE2b-256 98bf0c8c270f1ac34719f9604e90cd7db999b86334b130bafadf895aaaf8bf55

See more details on using hashes here.

File details

Details for the file scann-1.2.2-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.2-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd2ddae35decd98987c08e22e335bf69165c4a4352ced0b4977cc862b9c4643f
MD5 a84bebff6c833bc420d2ecac6ee56d02
BLAKE2b-256 c34da6997b150aa6fca37e0877c78eea1f476083bf201c6ec5c57596e78111b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page