Multi-threaded matrix multiplication and cosine similarity calculations.
Project description
ChunkDot
Multi-threaded matrix multiplication and cosine similarity calculations. Appropriate for the calculation of the K most similar items for a large number of items (~1 Million) by partitioning the item matrix representation (embeddings) and using Numba to accelerate the calculations.
Usage
pip install -U chunkdot
Calculate the 50 most similar and dissimilar items for 100K items.
import numpy as np
from chunkdot import cosine_similarity_top_k
embeddings = np.random.randn(100000, 256)
# using all you system's memory
cosine_similarity_top_k(embeddings, top_k=50)
# most dissimilar items using 20GB
cosine_similarity_top_k(embeddings, top_k=-50, max_memory=20E9)
<100000x100000 sparse matrix of type '<class 'numpy.float64'>'
with 5000000 stored elements in Compressed Sparse Row format>
The execution time
from timeit import timeit
import numpy as np
from chunkdot import cosine_similarity_top_k
embeddings = np.random.randn(100000, 256)
timeit(lambda: cosine_similarity_top_k(embeddings, top_k=50, max_memory=20E9), number=1)
58.611996899999994
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chunkdot-0.1.4.tar.gz
(6.5 kB
view details)
Built Distribution
File details
Details for the file chunkdot-0.1.4.tar.gz
.
File metadata
- Download URL: chunkdot-0.1.4.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
fede4607e80de1f24f58d04b77ea13ea1f94e9b4f7c6451bb9c46559134968ed
|
|
MD5 |
f109ec46243a9facc6f1ad87dd7d6184
|
|
BLAKE2b-256 |
1e6e039a170abcb001ba4f6d9600a38dea207a07bd0f3ec42b4e394778950cee
|
File details
Details for the file chunkdot-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: chunkdot-0.1.4-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
6db9ad9a2fdfe922f053d7c8f521215b70c53423444cf5dd953ef73d8d96ddec
|
|
MD5 |
d9ee175d0f8eab5e2f3676fa12874216
|
|
BLAKE2b-256 |
a06bb6b1d90a6344ef3aeaab64cbaba61c2b2789f39ae676e0a231ff34a2f5b8
|