Skip to main content

Efficient Pairwise Cosine Similarity Computation

Project description

Efficient Pairwise Cosine Similarity Computation

The (i, j)-entry of the output matrix is the cosine distance between the i-th row of A and the j-th row of B. This function is only a wrapper, it uses the implementation of cosine_similarity from scikit-learn and the implementation of awesome_cossim_topn from sparse_dot_topn. For more details, please check:

To install this package:

pip install effcossim

Sample code:

from numpy import array
from effcossim.pcs import pairwise_cosine_similarity, pp_pcs

A = array([
    [1, 2, 3], 
    [0, 1, 2],
    [5, 1, 1]
])

B = array([
    [1, 1, 2], 
    [0, 1, 2],
    [5, 0, 1], 
    [0, 0, 4]
])

# scikit-learn implementation
M1 = pairwise_cosine_similarity(
    A=A, B=B, 
    efficient=False, 
    dense_output=True
)

# sparse_dot_topn implementation
M2 = pairwise_cosine_similarity(
    A=A, B=B, 
    efficient=True, 
    n_top=4, 
    lower_bound=0.5, 
    n_jobs=2, 
    dense_output=True
)

When efficient=True, in each row of the output matrix only the top n_top entries above lower_bound are retained (lower memory impacts). Furthermore, if n_jobs is larger than 1, parallel computations are applied (higher speed).

If multiple comparisons are required, the parallel implementation can be used.

l1 = [random(m=10000, n=1000, density=0.3,) for _ in range(6)]
l2 = [random(m=10000, n=1000, density=0.3,) for _ in range(6)]

L = pp_pcs(
    l1=l1, 
    l2=l2, 
    n_workers=2, 
    efficient=True, 
    n_top=10, 
    lower_bound=0.3, 
    n_jobs=2, 
    dense_output=False
)

The output is a list where the k-th element is the output of

pairwise_cosine_similarity(l1[k], l2[k])

For further examples, check the notebook.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

effcossim-1.0.4.tar.gz (4.3 kB view details)

Uploaded Source

File details

Details for the file effcossim-1.0.4.tar.gz.

File metadata

  • Download URL: effcossim-1.0.4.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.3

File hashes

Hashes for effcossim-1.0.4.tar.gz
Algorithm Hash digest
SHA256 6d25b9a2ab1d42f2d9e8e41417c749b7df327c5289a2b359ff53f80a5ca53d0c
MD5 022283ac680b122258eab51cb52e040b
BLAKE2b-256 b95e8e9d91f9ea4b9f8a5bf19cfd69ca48161e85db10da50b9d25dbac6ff208a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page