TPU index is a package for fast similarity search over large collections of high dimension vectors on Google Cloud TPUs
Project description
TPU Index
TPU Index is a package for fast similarity search over large collections of high dimension vectors on TPUs. This package was built to support our project that we developed for https://tfworld.devpost.com/.
Uses:
- Dealing with a large number of vectors that do not fit on a CPU. TPU v2 has 8x8=64 gbs. TPU v3 has 16x8=128 gbs.
- Speed up similarity searches. On a colab TPU v2, a single cos similairty search of 19.5 million vectors of dimension 512 takes ~1.017 seconds.
Link to our project: https://devpost.com/software/naturallanguagerecommendations
Installation
!pip install tpu-index
Basic usage
from tpu_index import TPUIndex
index = TPUIndex(num_tpu_cores=8)
index.create_index(vectors) # vectors = numpy array, shape == [None, None]
...
D, I = index.search(xq, distance_metric='cosine', top_k=5)
For large numbers of vectors that do not fit on the CPU, add them in chunks
index.create_index(vectorsChunk1) # vectors = numpy array, shape == [None, None]
for file in files:
vectorChunk = np.load(file)
index.append_index(vectorChunk)
# Now perform search
D, I = index.search(xq, distance_metric='cosine', top_k=5)
ToDo:
- Add more distance metrics
- Optional GPU support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tpu_index-0.0.5.tar.gz
(3.4 kB
view hashes)
Built Distribution
Close
Hashes for tpu_index-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 099edbd3624e5a71a37385bf52cd65d10421fc0b6af934dd0b26bdbd0184bbd4 |
|
MD5 | 7b9d8ca5801597c68f8e9cb2649f779a |
|
BLAKE2b-256 | e3a196ccb3198cd01363d7bb310e73165fa12d1c8dc18b6b020ae17a9464be3e |