Multi-threaded matrix multiplication and cosine similarity calculations.
Project description
ChunkDot
Multi-threaded matrix multiplication and cosine similarity calculations. Appropriate for the calculation of the K most similar items for a large number of items (~1 Million) by partitioning the item matrix representation (embeddings) and using Numba to accelerate the calculations.
Usage
pip install -U chunkdot
Calculate the 50 most similar and dissimilar items for 100K items.
import numpy as np
from chunkdot import cosine_similarity_top_k
embeddings = np.random.randn(100000, 256)
# using all you system's memory
cosine_similarity_top_k(embeddings, top_k=50)
# most dissimilar items using 20GB
cosine_similarity_top_k(embeddings, top_k=-50, max_memory=20E9)
<100000x100000 sparse matrix of type '<class 'numpy.float64'>'
with 5000000 stored elements in Compressed Sparse Row format>
The execution time
from timeit import timeit
import numpy as np
from chunkdot import cosine_similarity_top_k
embeddings = np.random.randn(100000, 256)
timeit(lambda: cosine_similarity_top_k(embeddings, top_k=50, max_memory=20E9), number=1)
58.611996899999994
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkdot-0.1.4.tar.gz.
File metadata
- Download URL: chunkdot-0.1.4.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fede4607e80de1f24f58d04b77ea13ea1f94e9b4f7c6451bb9c46559134968ed
|
|
| MD5 |
f109ec46243a9facc6f1ad87dd7d6184
|
|
| BLAKE2b-256 |
1e6e039a170abcb001ba4f6d9600a38dea207a07bd0f3ec42b4e394778950cee
|
File details
Details for the file chunkdot-0.1.4-py3-none-any.whl.
File metadata
- Download URL: chunkdot-0.1.4-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6db9ad9a2fdfe922f053d7c8f521215b70c53423444cf5dd953ef73d8d96ddec
|
|
| MD5 |
d9ee175d0f8eab5e2f3676fa12874216
|
|
| BLAKE2b-256 |
a06bb6b1d90a6344ef3aeaab64cbaba61c2b2789f39ae676e0a231ff34a2f5b8
|