Skip to main content

Multi-threaded matrix multiplication and cosine similarity calculations.

Project description

ChunkDot

Multi-threaded matrix multiplication and cosine similarity calculations. Appropriate for the calculation of the K most similar items for a large number of items (~1 Million) by partitioning the item matrix representation (embeddings) and using Numba to accelerate the calculations.

Usage

pip install -U chunkdot

Calculate the 50 most similar and dissimilar items for 100K items.

import numpy as np
from chunkdot import cosine_similarity_top_k

embeddings = np.random.randn(100000, 256)
# using all you system's memory
cosine_similarity_top_k(embeddings, top_k=50)
# most dissimilar items using 20GB
cosine_similarity_top_k(embeddings, top_k=-50, max_memory=20E9)
<100000x100000 sparse matrix of type '<class 'numpy.float64'>'
 with 5000000 stored elements in Compressed Sparse Row format>

The execution time

from timeit import timeit
import numpy as np
from chunkdot import cosine_similarity_top_k

embeddings = np.random.randn(100000, 256)
timeit(lambda: cosine_similarity_top_k(embeddings, top_k=50, max_memory=20E9), number=1)
58.611996899999994

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkdot-0.1.4.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

chunkdot-0.1.4-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file chunkdot-0.1.4.tar.gz.

File metadata

  • Download URL: chunkdot-0.1.4.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0

File hashes

Hashes for chunkdot-0.1.4.tar.gz
Algorithm Hash digest
SHA256 fede4607e80de1f24f58d04b77ea13ea1f94e9b4f7c6451bb9c46559134968ed
MD5 f109ec46243a9facc6f1ad87dd7d6184
BLAKE2b-256 1e6e039a170abcb001ba4f6d9600a38dea207a07bd0f3ec42b4e394778950cee

See more details on using hashes here.

File details

Details for the file chunkdot-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: chunkdot-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.11 Darwin/21.6.0

File hashes

Hashes for chunkdot-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6db9ad9a2fdfe922f053d7c8f521215b70c53423444cf5dd953ef73d8d96ddec
MD5 d9ee175d0f8eab5e2f3676fa12874216
BLAKE2b-256 a06bb6b1d90a6344ef3aeaab64cbaba61c2b2789f39ae676e0a231ff34a2f5b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page