Learning to Hash for Maximum Inner Product Search
Project description
SITQ is a fast algorithm for approximate Maximum Inner Product Search (MIPS). It can find items which are likely to maximize inner product against a query in sublinear time.
Benchemark
Recommendation is one of fields where SITQ can be used. Experiments were conducted with MovieLens 100K Dataset and MovieLens 20M Dataset.
ALS in benfred/implicit is used to learn vectors of items and users, where the score of (user, item) pair is computed through inner product of those vectors. Precision@10 is the ratio of correct recommendations against test dataset. Fetched items are items against which inner product are computed. Hashing algorithms are more preferable as average and standard deviation of fetched items are smaller.
ml-100k
Name |
Precision\@10 |
Fetched items. Avg |
Fetched items. Std |
---|---|---|---|
SITQ |
0.202 |
105.2 |
76.6 |
Simple-LSH |
0.182 |
496.2 |
441.2 |
ITQ |
0.199 |
131.3 |
93.7 |
LSH |
0.156 |
161.9 |
94.4 |
brute force |
0.242 |
(1680) |
ml-20m
Name |
Precision\@10 |
Fetched items. Avg |
Fetched items. Std |
---|---|---|---|
SITQ |
0.112 |
96.1 |
151.1 |
Simple-LSH |
0.122 |
2158.2 |
5246.6 |
ITQ |
0.090 |
111.0 |
332.9 |
LSH |
0.069 |
531.3 |
912.2 |
brute force |
0.151 |
(26671) |
Algorithm
SITQ is an algorithm which combines Simple-LSH [1] and ITQ [2].
Simple-LSH utilizes ordinary LSH which is for cosine similarity. In order to use LSH for MIPS, it converts a vector before computing its signature.
LSH computes signatures through transformation matrix which is fixed. ITQ learns transformation matrix from item vectors for better hashing.
SITQ converts vectors by means of Simple-LSH, and learns transformation matrix through ITQ.
Example
Install
pip install sitq
Get Signature
import numpy as np
from sitq import Sitq
# Create sample dataset
items = np.random.rand(10000, 50)
query = np.random.rand(50)
sitq = Sitq(signature_size=8)
# Learn transformation matrix
sitq.fit(items)
# Get signatures for items
item_sigs = sitq.get_item_signatures(items)
# Get signature for query
query_sig = sitq.get_query_signatures([query])[0]
Retrieve items
import numpy as np
from sitq import Mips
# Create sample dataset
items = np.random.rand(10000, 50)
query = np.random.rand(50)
mips = Mips(signature_size=8)
# Learn lookup table and parameters for search
mips.fit(items)
# Find items which are likely to maximize inner product against query
item_indexes, scores = mips.search(query, limit=10, distance=1)
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file sitq-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: sitq-0.1.3-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 125e2a17a31376ab257ffa7ec68289b656201071dc9b2a363655bd62fca1ab80 |
|
MD5 | 0be0695189e4a2badbb63d433dc25fac |
|
BLAKE2b-256 | e3560a334690383aa1c841fa32990ca5d2bbc7fd61d17d2a74624cab89433228 |