Skip to main content

SPTAG: A library for fast approximate nearest neighbor search

Project description

SPTAG: A library for fast approximate nearest neighbor search

MIT licensed Build status

SPTAG

SPTAG (Space Partition Tree And Graph) is a library for large scale vector approximate nearest neighbor search scenario released by Microsoft Research (MSR) and Microsoft Bing.

architecture

Introduction

This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.

SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.

How it works

SPTAG is inspired by the NGS approach [WangL12]. It contains two basic modules: index builder and searcher. The RNG is built on the k-nearest neighborhood graph [WangWZTG12, WangWJLZZH14] for boosting the connectivity. Balanced k-means trees are used to replace kd-trees to avoid the inaccurate distance bound estimation in kd-trees for very high-dimensional vectors. The search begins with the search in the space partition trees for finding several seeds to start the search in the RNG. The searches in the trees and the graph are iteratively conducted.

Highlights

  • Fresh update: Support online vector deletion and insertion
  • Distributed serving: Search over multiple machines

Build

Requirements

  • swig >= 4.0.2
  • cmake >= 3.12.0
  • boost >= 1.67.0

Fast clone

set GIT_LFS_SKIP_SMUDGE=1
git clone --recurse-submodules https://github.com/microsoft/SPTAG

OR

git config --global filter.lfs.smudge "git-lfs smudge --skip -- %f"
git config --global filter.lfs.process "git-lfs filter-process --skip"

Install

For Linux:

mkdir build
cd build && cmake .. && make

It will generate a Release folder in the code directory which contains all the build targets.

For Windows:

mkdir build
cd build && cmake -A x64 ..

It will generate a SPTAGLib.sln in the build directory. Compiling the ALL_BUILD project in the Visual Studio (at least 2019) will generate a Release directory which contains all the build targets.

For detailed instructions on installing Windows binaries, please see here

Using Docker:

docker build -t sptag .

Will build a docker container with binaries in /app/Release/.

Verify

Run the SPTAGTest (or Test.exe) in the Release folder to verify all the tests have passed.

Usage

The detailed usage can be found in Get started. There is also an end-to-end tutorial for building vector search online service using Python Wrapper in Python Tutorial. The detailed parameters tunning can be found in Parameters.

References

Please cite SPTAG in your publications if it helps your research:

@inproceedings{ChenW21,
  author = {Qi Chen and 
            Bing Zhao and 
            Haidong Wang and 
            Mingqin Li and 
            Chuanjie Liu and 
            Zengzhong Li and 
            Mao Yang and 
            Jingdong Wang},
  title = {SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search},
  booktitle = {35th Conference on Neural Information Processing Systems (NeurIPS 2021)},
  year = {2021}
}

@manual{ChenW18,
  author    = {Qi Chen and
               Haidong Wang and
               Mingqin Li and 
               Gang Ren and
               Scarlett Li and
               Jeffery Zhu and
               Jason Li and
               Chuanjie Liu and
               Lintao Zhang and
               Jingdong Wang},
  title     = {SPTAG: A library for fast approximate nearest neighbor search},
  url       = {https://github.com/Microsoft/SPTAG},
  year      = {2018}
}

@inproceedings{WangL12,
  author    = {Jingdong Wang and
               Shipeng Li},
  title     = {Query-driven iterated neighborhood graph search for large scale indexing},
  booktitle = {ACM Multimedia 2012},
  pages     = {179--188},
  year      = {2012}
}

@inproceedings{WangWZTGL12,
  author    = {Jing Wang and
               Jingdong Wang and
               Gang Zeng and
               Zhuowen Tu and
               Rui Gan and
               Shipeng Li},
  title     = {Scalable k-NN graph construction for visual descriptors},
  booktitle = {CVPR 2012},
  pages     = {1106--1113},
  year      = {2012}
}

@article{WangWJLZZH14,
  author    = {Jingdong Wang and
               Naiyan Wang and
               You Jia and
               Jian Li and
               Gang Zeng and
               Hongbin Zha and
               Xian{-}Sheng Hua},
  title     = {Trinary-Projection Trees for Approximate Nearest Neighbor Search},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume    = {36},
  number    = {2},
  pages     = {388--403},
  year      = {2014
}

Contribute

This project welcomes contributions and suggestions from all the users.

We use GitHub issues for tracking suggestions and bugs.

License

The entire codebase is under MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sptag-1.6.8-py39-none-win_amd64.whl (11.4 MB view details)

Uploaded Python 3.9Windows x86-64

sptag-1.6.8-py39-none-manylinux1_x86_64.whl (3.2 MB view details)

Uploaded Python 3.9

sptag-1.6.8-py38-none-win_amd64.whl (11.4 MB view details)

Uploaded Python 3.8Windows x86-64

sptag-1.6.8-py38-none-manylinux1_x86_64.whl (3.2 MB view details)

Uploaded Python 3.8

sptag-1.6.8-py37-none-win_amd64.whl (11.4 MB view details)

Uploaded Python 3.7Windows x86-64

sptag-1.6.8-py37-none-manylinux1_x86_64.whl (3.2 MB view details)

Uploaded Python 3.7

File details

Details for the file sptag-1.6.8-py39-none-win_amd64.whl.

File metadata

  • Download URL: sptag-1.6.8-py39-none-win_amd64.whl
  • Upload date:
  • Size: 11.4 MB
  • Tags: Python 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for sptag-1.6.8-py39-none-win_amd64.whl
Algorithm Hash digest
SHA256 79e29541139a190e6d3296df500d4d4642abcf6d1fe0cb0b905fcf93148ccaf4
MD5 9301796a1ad8f061bd07531bd537d94a
BLAKE2b-256 260f38fc0e22f99d23498b455aed97dd049c6c2abe0e9e44113af3953f7b4e86

See more details on using hashes here.

File details

Details for the file sptag-1.6.8-py39-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sptag-1.6.8-py39-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 97579dd7bf1e9e116f4a00ac5e5f8cc985280242c5df2f8e8142e66d85bacebf
MD5 96de698214d0b63dfffca081d92c68bb
BLAKE2b-256 4676b3529cc4c7ff30210cedd8d620da1fda15d7dd47010e89e896754bd9d546

See more details on using hashes here.

File details

Details for the file sptag-1.6.8-py38-none-win_amd64.whl.

File metadata

  • Download URL: sptag-1.6.8-py38-none-win_amd64.whl
  • Upload date:
  • Size: 11.4 MB
  • Tags: Python 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for sptag-1.6.8-py38-none-win_amd64.whl
Algorithm Hash digest
SHA256 701dbda1c9cf7efb2c7ed37c268a9357cc30b09134402b7baecc06fa2c2ecafb
MD5 4ecab2426cea560e7e390add3fc79f6e
BLAKE2b-256 e28ef6894b4b3b3a5442f5bbf1e77a7bdf0ff75bbc508059b8520a514ac23bd9

See more details on using hashes here.

File details

Details for the file sptag-1.6.8-py38-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sptag-1.6.8-py38-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 07f9a34b42a9f393d6d74f13fc908abb2b4f81f9dcdbc4421c9eaa6c7b05b5a3
MD5 5514fdf2877b1fcbc0acd154b090de4a
BLAKE2b-256 9f55507d98d80c3dd7dd44b28c86a4b15fba0fe84af9f0012c864ca53b568efd

See more details on using hashes here.

File details

Details for the file sptag-1.6.8-py37-none-win_amd64.whl.

File metadata

  • Download URL: sptag-1.6.8-py37-none-win_amd64.whl
  • Upload date:
  • Size: 11.4 MB
  • Tags: Python 3.7, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for sptag-1.6.8-py37-none-win_amd64.whl
Algorithm Hash digest
SHA256 8713ccf78cb81d75ae95b36a21ba12044e0044bab18de8f1677472e0d46d2bec
MD5 2a0c50a5841f354e6bf10aaae72ae214
BLAKE2b-256 64c5722744f8669765c1fd5e724509b97f5858e14121bc02e41c0fe39f4fe39a

See more details on using hashes here.

File details

Details for the file sptag-1.6.8-py37-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sptag-1.6.8-py37-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 48a6e3671ce6cf2a32fd6012c582809666737b24a585a81a241939d35d3f692e
MD5 0a856e60dcda962b524359494a722361
BLAKE2b-256 94ec6dbfe397818e3f1b31970d2ac6e1632b59a52066225bf32d547d6467e4cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page