Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.7-3.10. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.7.x or later) and Tensorflow 2.11 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.9-cp310-cp310-manylinux_2_27_x86_64.whl (10.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.2.9-cp39-cp39-manylinux_2_27_x86_64.whl (10.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

scann-1.2.9-cp38-cp38-manylinux_2_27_x86_64.whl (10.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.27+ x86-64

scann-1.2.9-cp37-cp37m-manylinux_2_27_x86_64.whl (10.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.2.9-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.9-cp310-cp310-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 10.5 MB
  • Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.7

File hashes

Hashes for scann-1.2.9-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 731e96b97fdb8557c22bc8e0c39e72004fb2562c95f0cc35cf3488afcaf54c6a
MD5 c37fe7db1071f8c46ac6858f8cdaae3b
BLAKE2b-256 20514282f8773cd5f949ed36b348354e68aafd44d8ed73f79e412044e8577319

See more details on using hashes here.

File details

Details for the file scann-1.2.9-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.9-cp39-cp39-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 10.5 MB
  • Tags: CPython 3.9, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.7

File hashes

Hashes for scann-1.2.9-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 b29e18bdb94251e56131cf8b6dba23f6a617567ebd1ca4077c31390d0ccddbc8
MD5 98b3600ebc3c89045b3bb98a38de0360
BLAKE2b-256 e9b89568da323d6bea6abd74fe8186e4b53807b01310c6ef9bbcddb3eb27528d

See more details on using hashes here.

File details

Details for the file scann-1.2.9-cp38-cp38-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.9-cp38-cp38-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 10.5 MB
  • Tags: CPython 3.8, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.7

File hashes

Hashes for scann-1.2.9-cp38-cp38-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 daa30124712eb34036bed2c345d40bc533b8ac6fe25613946467594e1ace49a0
MD5 b2457a54a62378fb867ba42ac0b9a352
BLAKE2b-256 4cc2897a2f4dd2b38b2f564bc9a0d8d86b823a693d61abe88556a2947e75e39c

See more details on using hashes here.

File details

Details for the file scann-1.2.9-cp37-cp37m-manylinux_2_27_x86_64.whl.

File metadata

  • Download URL: scann-1.2.9-cp37-cp37m-manylinux_2_27_x86_64.whl
  • Upload date:
  • Size: 10.5 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.27+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.7

File hashes

Hashes for scann-1.2.9-cp37-cp37m-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 bef0bfdb0741d7f75d319a79bb914beef3d717233b71898951f2bf1d81809a4f
MD5 f4e83b9b6edd52086db91cb766e22546
BLAKE2b-256 676e723ff9466a21925a6500d28e401033458ed3bb32762b7c3b87045443b268

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page