Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux2014-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.5-3.8. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux2014 specification, ScaNN requires libstdc++ version 3.4.19 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.5.x or later) and Tensorflow 2.3.0 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.1.1-cp38-cp38-manylinux2014_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.8

scann-1.1.1-cp37-cp37m-manylinux2014_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.7m

scann-1.1.1-cp36-cp36m-manylinux2014_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.6m

scann-1.1.1-cp35-cp35m-manylinux2014_x86_64.whl (11.7 MB view details)

Uploaded CPython 3.5m

File details

Details for the file scann-1.1.1-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.1.1-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.7 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.1.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f4b135790d0d4a431120311521ce4ab1add5fc110c6ca6a429a30325de710c0
MD5 91e39995ab0c8a0814fcd2484de7b814
BLAKE2b-256 387597c90ea1f9c644ded41ad4aa6f2aa97a71db99cb7c8d2dd371eafa6a7a92

See more details on using hashes here.

File details

Details for the file scann-1.1.1-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.1.1-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.7 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.1.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 afebbc9d907438261e94435937142997571b82a927630ff42cbcc5ba82c85e31
MD5 8156ab0600ff3b97d8deed45c485262c
BLAKE2b-256 fb211d5a4ea46d0a1bce5241bddd16b5fa7c3bb04a0d09118b671617a70678db

See more details on using hashes here.

File details

Details for the file scann-1.1.1-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.1.1-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.7 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.1.1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 81c42b9bc6258dcb734e4d9d73c0b92b32f759f27d9b396be618e694fb772174
MD5 9ded2e5f0c296cec3e78f50fd110b410
BLAKE2b-256 ad14ddc441a359e9947bb25befb86ec9c6f47d2d45cce7776ae20237cf9fd08d

See more details on using hashes here.

File details

Details for the file scann-1.1.1-cp35-cp35m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.1.1-cp35-cp35m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.7 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.1.1-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56b5923a3b6e99e3ce6638c16f4a38e742fe9a070eb91cbe2c2962633995b91e
MD5 53bbb0f14dc97f034ae5d9b51b2c3610
BLAKE2b-256 2456686452a3db3b8bcb8990b0b4da6b7bf22f833f0b72d263b541d868a7cb7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page