sparse-dot-topn

This package boosts a sparse matrix multiplication followed by selecting the top-n multiplication

These details have been verified by PyPI

Maintainers

mbaak rurlus stephanecollot ymwdalex

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# sparse\_dot\_topn:

**sparse\_dot\_topn** provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.

Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, **our approach improves the speed by about 40% and reduces memory consumption.**

This package is made by ING Wholesale Banking Advanced Analytics team. This [blog](https://medium.com/@ingwbaa/https-medium-com-ingwbaa-boosting-selection-of-the-most-similar-entities-in-large-scale-datasets-450b3242e618) explains how we implement it.

## Example
``` python
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn

N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')

c = awesome_cossim_topn(a, b, 5, 0.01)
```

You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py

## Dependency and Install
Install `numpy` and `cython` first before installing this package. Then,
``` sh
pip install sparse_dot_topn
```

## Uninstall
``` sh
pip uninstall sparse_dot_topn
```

Project details

These details have been verified by PyPI

Maintainers

mbaak rurlus stephanecollot ymwdalex

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.1

Apr 10, 2024

1.1.0

Mar 1, 2024

1.0.0

Jan 31, 2024

0.3.6

Aug 8, 2023

0.3.5

Jul 24, 2023

0.3.4

Feb 12, 2023

0.3.3

Jun 27, 2022

0.3.2

Jun 27, 2022

0.3.1

Jun 10, 2021

0.3.0

Jun 10, 2021

0.2.9

Mar 20, 2020

0.2.8

Mar 2, 2020

0.2.7

Mar 1, 2020

0.2.6

Oct 2, 2019

0.2.5

May 28, 2019

0.2.4

May 28, 2019

0.2.3

May 28, 2019

This version

0.2.2

Nov 27, 2018

0.2.1

Nov 26, 2018

0.2

Nov 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_dot_topn-0.2.2.tar.gz (54.7 kB view hashes)

Uploaded Nov 27, 2018 Source

Built Distributions

sparse_dot_topn-0.2.2-cp37-cp37m-macosx_10_12_x86_64.whl (31.2 kB view hashes)

Uploaded May 28, 2019 CPython 3.7m macOS 10.12+ x86-64

sparse_dot_topn-0.2.2-cp36-cp36m-macosx_10_12_x86_64.whl (31.5 kB view hashes)

Uploaded Nov 27, 2018 CPython 3.6m macOS 10.12+ x86-64

sparse_dot_topn-0.2.2-cp27-cp27m-macosx_10_12_intel.whl (61.5 kB view hashes)

Uploaded May 28, 2019 CPython 2.7m macOS 10.12+ intel

Hashes for sparse_dot_topn-0.2.2.tar.gz

Hashes for sparse_dot_topn-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`a3234e586ca25d8c24da3aa3e63d676792282265e07996346db18c4898e36ec0`
MD5	`d7a76c1b5964893d46666e2411d136b8`
BLAKE2b-256	`f6f0babb3362b108c54c151dff172d09ba4957b4b79e3b84c75dbaab3384f952`

Hashes for sparse_dot_topn-0.2.2-cp37-cp37m-macosx_10_12_x86_64.whl

Hashes for sparse_dot_topn-0.2.2-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`16c61ea3221f21cc6e33bea296e91a3a79e3a3a381b38fff0ec9de516fe47bcc`
MD5	`d214b6b2785a8d7694012bebaff8b6ce`
BLAKE2b-256	`7134123e39fcad61243985d089271effaa107a2bca13569bb59cda33d294b3a6`

Hashes for sparse_dot_topn-0.2.2-cp36-cp36m-macosx_10_12_x86_64.whl

Hashes for sparse_dot_topn-0.2.2-cp36-cp36m-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`8fb943fb4b60c1c20365d2862c3bf27be051f39ad450d5359d1fcb36fa2b42b9`
MD5	`e2a8a280133869d930e2e408c16e5f31`
BLAKE2b-256	`fae23cf3a06265797542e0bb7a10dc6a3da033f49be0e44d6167901a36a3f10e`

Hashes for sparse_dot_topn-0.2.2-cp27-cp27m-macosx_10_12_intel.whl

Hashes for sparse_dot_topn-0.2.2-cp27-cp27m-macosx_10_12_intel.whl
Algorithm	Hash digest
SHA256	`0dadde3a591b0f63b1ef970d368cd0c0a54cad5a5295e7d9afdb7d751bd5d6ad`
MD5	`08e1c045d5b2caf1237f93490c3a2f6b`
BLAKE2b-256	`526ba53b606c1022a3b997183d69165ebd1b608e311200f66797ec8ad3faeb5e`