hdidx

ANN Search in High-Dimensional Spaces

Project description

HDIdx: Indexing High-Dimensional Data

What is HDIdx?

HDIdx is a python package for approximate nearest neighbor (ANN) search. Nearest neighbor (NN) search is very challenging in high-dimensional space because of the *Curse of Dimensionality* problem. The basic idea of HDIdx is to compress the original feature vectors into compact binary codes, and perform approximate NN search instead of extract NN search. This can largely reduce the storage requirements and can significantly speed up the search.

Architecture

https://raw.githubusercontent.com/wanji/hdidx/master/doc/framework.png

HDIdx has three main modules: 1) Encoder which can compress the original feature vectors into compact binary hash codes, 2) Indexer which can index the database items and search approximate nearest neighbor for a given query item, and 3) Storage module which encapsulates the underlying data storage, which can be memory or NoSQL database like LMDB, for the Indexer.

The current version implements following feature compressing algorithms:

Product Quantization[1].
Spectral Hashing[2].

To use HDIdx, first you should learn a Encoder from some learning vectors. Then you can map the base vectors into hash codes using the learned Encoder and building indexes over these hash codes by an Indexer, which will write the indexes to the specified storage medium. When a query vector comes, it will be mapped to hash codes by the same Encoder and the Indexer will find the similar items to this query vector.

Installation

HDIdx can be installed by pip:

[sudo] pip install cython
[sudo] pip install hdidx

By default, HDIdx use kmeans algorithm provided by *SciPy*. To be more efficient, you can install python extensions of *OpenCV*, which can be installed via apt-get on Ubuntu. For other Linux distributions, e.g. CentOS, you need to compile it from source.

[sudo] apt-get install python-opencv

HDIdx will use *OpenCV* automatically if it is available.

Windows Guide

General dependencies:

After install the above mentioned software, download `stdint.h <http://msinttypes.googlecode.com/svn/trunk/stdint.h>`__ and put it under the include folder of Visual C++, e.g. C:\Users\xxx\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\include. Then hdidx can be installed by pip from the Anaconda Command Prompt.

Example

Here is a simple example. See this notebook for more examples.

# import necessary packages

import hdidx
import numpy as np

# generating sample data
ndim = 16      # dimension of features
ndb = 10000    # number of dababase items
nqry = 10      # number of queries

X_db = np.random.random((ndb, ndim))
X_qry = np.random.random((nqry, ndim))

# create Product Quantization Indexer
idx = hdidx.indexer.IVFPQIndexer()
# build indexer
idx.build({'vals': X_db, 'nsubq': 8})
# add database items to the indexer
idx.add(X_db)
# searching in the database, and return top-10 items for each query
ids, dis = idx.search(X_qry, 10)
print ids
print dis

Reference

[1] Jegou, Herve, Matthijs Douze, and Cordelia Schmid.
    "Product quantization for nearest neighbor search."
    Pattern Analysis and Machine Intelligence, IEEE Transactions on 33.1 (2011): 117-128.
[2] Weiss, Yair, Antonio Torralba, and Rob Fergus.
    "Spectral hashing."
    In Advances in neural information processing systems, pp. 1753-1760. 2009.

Project details

Release history Release notifications | RSS feed

This version

0.2.8.2

Jun 25, 2017

0.2.8.1

Jan 10, 2017

0.2.8

Oct 5, 2015

0.2.2

Sep 19, 2015

0.2.1

Sep 19, 2015

0.2

Aug 11, 2015

0.1

Jul 31, 2015

0.0.14

Jul 25, 2015

0.0.12

Apr 4, 2015

0.0.11

Apr 2, 2015

0.0.9

Dec 12, 2014

0.0.8

Dec 11, 2014

0.0.6

Nov 5, 2014

0.0.5

Nov 5, 2014

0.0.4

Nov 5, 2014

0.0.3

Nov 4, 2014

0.0.2

Nov 4, 2014

0.0.1

Nov 4, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdidx-0.2.8.2.tar.gz (24.4 kB view details)

Uploaded Jun 25, 2017 Source

File details

Details for the file hdidx-0.2.8.2.tar.gz.

File metadata

Download URL: hdidx-0.2.8.2.tar.gz
Upload date: Jun 25, 2017
Size: 24.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for hdidx-0.2.8.2.tar.gz
Algorithm	Hash digest
SHA256	`2243e2eb1b056a304578cca17cc88060e01dee4b031be462af730b659f3cf573`
MD5	`aaf785e5e0cfadd13b32d53d81b13932`
BLAKE2b-256	`695780d49610fa6656f229a37796db48247c8979154fa901884e0e2bb1fe8632`

See more details on using hashes here.

hdidx 0.2.8.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

HDIdx: Indexing High-Dimensional Data

What is HDIdx?

Architecture

Installation

Windows Guide

Example

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes