Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux2014-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.6-3.8. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux2014 specification, ScaNN requires libstdc++ version 3.4.19 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.6.x or later) and Tensorflow 2.3+ installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.0-cp38-cp38-manylinux2014_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.8

scann-1.2.0-cp37-cp37m-manylinux2014_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.7m

scann-1.2.0-cp36-cp36m-manylinux2014_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.6m

File details

Details for the file scann-1.2.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.1 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b9c22df116ecf978721f883f9c72e64226c7517184faeae6fef3a6f5370a2038
MD5 3895a2462308ad8d1bc1b3b269b326d4
BLAKE2b-256 4b66998e05777423cc26a0ff9a1ecff08d93393521f7367c3efecef18c3d2725

See more details on using hashes here.

File details

Details for the file scann-1.2.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.0-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d3a14fdb061fc6a6d7bb6d8efdcd3f98656ac581609df8c1c0ed36e536e65f4a
MD5 52f204a00050d676f9eaf1b71e3ecf9a
BLAKE2b-256 557b2c68f94a588ee6e084535f34ef735dd7d3d7a6fa09f554677d3ffeaebaa7

See more details on using hashes here.

File details

Details for the file scann-1.2.0-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.0-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 78955d38d4aa0b575b0b6e2aecf32ed4d6503f8f2e825ee9ae83352649b085ce
MD5 288cabe9631c1a0115982539a9751751
BLAKE2b-256 f71b9dd140842fc9d2aa72da42d4e7966c959833a1f3590547ff999c99b33803

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page