This package boosts a sparse matrix multiplication followed by selecting the top-n multiplication
Project description
sparse_dot_topn:
sparse_dot_topn provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.
Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption.
This package is made by ING Wholesale Banking Advanced Analytics team. This blog explains how we implement it.
Example
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn
N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')
# Use standard implementation
c = awesome_cossim_topn(a, b, N, 0.01)
# Use parallel implementation with 4 threads
d = awesome_cossim_topn(a, b, N, 0.01, use_threads=True, n_jobs=4)
You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py
Dependency and Install
Install numpy and cython first before installing this package. Then,
pip install sparse_dot_topn
Uninstall
pip uninstall sparse_dot_topn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparse_dot_topn-0.2.8.tar.gz.
File metadata
- Download URL: sparse_dot_topn-0.2.8.tar.gz
- Upload date:
- Size: 106.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05d1d3384cd7c0b2c45aa988b2cbe28f11697b6fbe6f36e79788671064d4bf19
|
|
| MD5 |
b573aceb1aaf554aaf65ba8c8dc92c02
|
|
| BLAKE2b-256 |
6b6e322880f13aeaae70754cd9d5c99c4bb9a61a3de5a9187d34079fbe0548a2
|
File details
Details for the file sparse_dot_topn-0.2.8-cp37-cp37m-macosx_10_14_x86_64.whl.
File metadata
- Download URL: sparse_dot_topn-0.2.8-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 60.9 kB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dea8617aff2ed58154a97df37932a47ef85f5566d9e470ea70ad6993f5b9538
|
|
| MD5 |
d3cd4d7a6e6b23cb7874ee7f392c3000
|
|
| BLAKE2b-256 |
692e20f605d0d54070036dbe39e15c40df2b54a9f4ec2a89dc309c32805290e6
|
File details
Details for the file sparse_dot_topn-0.2.8-cp27-cp27m-macosx_10_14_intel.whl.
File metadata
- Download URL: sparse_dot_topn-0.2.8-cp27-cp27m-macosx_10_14_intel.whl
- Upload date:
- Size: 69.0 kB
- Tags: CPython 2.7m, macOS 10.14+ Intel (x86-64, i386)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7a3fdd183540369a17a3db8da4fa49201e2cecf44e116684ccfd6946143c6d1
|
|
| MD5 |
57a6cc60d17c208f36ef361b1046f1ac
|
|
| BLAKE2b-256 |
f42be7d769665e6b2110aa22647193bfff27424c28adab88866d863c5d9457c7
|