This package boosts a sparse matrix multiplication followed by selecting the top-n multiplication
Project description
# sparse\_dot\_topn:
**sparse\_dot\_topn** provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.
Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, **our approach improves the speed by about 40% and reduces memory consumption.**
This package is made by ING Wholesale Banking Advanced Analytics team. This [blog](https://medium.com/@ingwbaa/https-medium-com-ingwbaa-boosting-selection-of-the-most-similar-entities-in-large-scale-datasets-450b3242e618) explains how we implement it.
## Example
``` python
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn
N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')
c = awesome_cossim_topn(a, b, 5, 0.01)
```
You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py
## Dependency and Install
Install `numpy` and `cython` first before installing this package. Then,
``` sh
pip install sparse_dot_topn
```
## Uninstall
``` sh
pip uninstall sparse_dot_topn
```
**sparse\_dot\_topn** provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.
Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, **our approach improves the speed by about 40% and reduces memory consumption.**
This package is made by ING Wholesale Banking Advanced Analytics team. This [blog](https://medium.com/@ingwbaa/https-medium-com-ingwbaa-boosting-selection-of-the-most-similar-entities-in-large-scale-datasets-450b3242e618) explains how we implement it.
## Example
``` python
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn
N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')
c = awesome_cossim_topn(a, b, 5, 0.01)
```
You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py
## Dependency and Install
Install `numpy` and `cython` first before installing this package. Then,
``` sh
pip install sparse_dot_topn
```
## Uninstall
``` sh
pip uninstall sparse_dot_topn
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sparse_dot_topn-0.2.2.tar.gz
(54.7 kB
view hashes)
Built Distributions
Close
Hashes for sparse_dot_topn-0.2.2-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16c61ea3221f21cc6e33bea296e91a3a79e3a3a381b38fff0ec9de516fe47bcc |
|
MD5 | d214b6b2785a8d7694012bebaff8b6ce |
|
BLAKE2b-256 | 7134123e39fcad61243985d089271effaa107a2bca13569bb59cda33d294b3a6 |
Close
Hashes for sparse_dot_topn-0.2.2-cp36-cp36m-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fb943fb4b60c1c20365d2862c3bf27be051f39ad450d5359d1fcb36fa2b42b9 |
|
MD5 | e2a8a280133869d930e2e408c16e5f31 |
|
BLAKE2b-256 | fae23cf3a06265797542e0bb7a10dc6a3da033f49be0e44d6167901a36a3f10e |
Close
Hashes for sparse_dot_topn-0.2.2-cp27-cp27m-macosx_10_12_intel.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dadde3a591b0f63b1ef970d368cd0c0a54cad5a5295e7d9afdb7d751bd5d6ad |
|
MD5 | 08e1c045d5b2caf1237f93490c3a2f6b |
|
BLAKE2b-256 | 526ba53b606c1022a3b997183d69165ebd1b608e311200f66797ec8ad3faeb5e |