Skip to main content

Learning to Hash for Maximum Inner Product Search

Project description

SITQ is a fast algorithm for approximate Maximum Inner Product Search (MIPS). It can find items which are likely to maximize inner product against a query in sublinear time.

Benchemark

Recommendation is one of fields where SITQ can be used. Experiments were conducted with MovieLens 100K Dataset and MovieLens 20M Dataset.

ALS in benfred/implicit is used to learn vectors of items and users, where the score of (user, item) pair is computed through inner product of those vectors. Precision@10 is the ratio of correct recommendations against test dataset. Fetched items are items against which inner product are computed. Hashing algorithms are more preferable as average and standard deviation of fetched items are smaller.

ml-100k

Signature length: 4, Minimum fetched items: 20

Name

Precision\@10

Fetched items. Avg

Fetched items. Std

SITQ

0.202

105.2

76.6

Simple-LSH

0.182

496.2

441.2

ITQ

0.199

131.3

93.7

LSH

0.156

161.9

94.4

brute force

0.242

(1680)

ml-20m

Signature length: 8, Minimum fetched items: 20

Name

Precision\@10

Fetched items. Avg

Fetched items. Std

SITQ

0.112

96.1

151.1

Simple-LSH

0.122

2158.2

5246.6

ITQ

0.090

111.0

332.9

LSH

0.069

531.3

912.2

brute force

0.151

(26671)

Algorithm

SITQ is an algorithm which combines Simple-LSH [1] and ITQ [2].

Simple-LSH utilizes ordinary LSH which is for cosine similarity. In order to use LSH for MIPS, it converts a vector before computing its signature.

LSH computes signatures through transformation matrix which is fixed. ITQ learns transformation matrix from item vectors for better hashing.

SITQ converts vectors by means of Simple-LSH, and learns transformation matrix through ITQ.

Example

Install

pip install sitq

Get Signature

import numpy as np

from sitq import Sitq


# Create sample dataset
items = np.random.rand(10000, 50)
query = np.random.rand(50)

sitq = Sitq(signature_size=8)

# Learn transformation matrix
sitq.fit(items)

# Get signatures for items
item_sigs = sitq.get_item_signatures(items)

# Get signature for query
query_sig = sitq.get_query_signatures([query])[0]

Retrieve items

import numpy as np

from sitq import Mips


# Create sample dataset
items = np.random.rand(10000, 50)
query = np.random.rand(50)

mips = Mips(signature_size=8)

# Learn lookup table and parameters for search
mips.fit(items)

# Find items which are likely to maximize inner product against query
item_indexes, scores = mips.search(query, limit=10, distance=1)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sitq-0.1.3-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file sitq-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sitq-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for sitq-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 125e2a17a31376ab257ffa7ec68289b656201071dc9b2a363655bd62fca1ab80
MD5 0be0695189e4a2badbb63d433dc25fac
BLAKE2b-256 e3560a334690383aa1c841fa32990ca5d2bbc7fd61d17d2a74624cab89433228

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page