TPU index is a package for fast similarity search over large collections of high dimension vectors on Google Cloud TPUs
TPU Index is a package for fast similarity search over large collections of high dimension vectors on TPUs. This package was built to support our project that we developed for https://tfworld.devpost.com/.
- Dealing with a large number of vectors that do not fit on a CPU. TPU v2 has 8x8=64 gbs. TPU v3 has 16x8=128 gbs.
- Speed up similarity searches. On a colab TPU v2, a single cos similairty search of 19.5 million vectors of dimension 512 takes ~1.017 seconds.
Link to our project: https://devpost.com/software/naturallanguagerecommendations
!pip install tpu-index
from tpu_index import TPUIndex index = TPUIndex(num_tpu_cores=8) index.create_index(vectors) # vectors = numpy array, shape == [None, None] ... D, I = index.search(xq, distance_metric='cosine', top_k=5)
For large numbers of vectors that do not fit on the CPU, add them in chunks
index.create_index(vectorsChunk1) # vectors = numpy array, shape == [None, None] for file in files: vectorChunk = np.load(file) index.append_index(vectorChunk) # Now perform search D, I = index.search(xq, distance_metric='cosine', top_k=5)
- [ ] Add more distance metrics
- [ ] Optional GPU support
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size tpu_index-0.0.5-py3-none-any.whl (3.6 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size tpu_index-0.0.5.tar.gz (3.4 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for tpu_index-0.0.5-py3-none-any.whl