Skip to main content

This package boosts a sparse matrix multiplication followed by selecting the top-n multiplication

Project description

sparse_dot_topn:

sparse_dot_topn provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.

Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption.

This package is made by ING Wholesale Banking Advanced Analytics team. This blog or this blog explains how we implement it.

Example

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn

N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')

# Default precision type is np.float64, but you can down cast to have a small memory footprint and faster execution
# Remark : These are the only 2 types supported now, since we assume that float16 will be difficult to implement and will be slower, because C doesn't support a 16-bit float type on most PCs
a = a.astype(np.float32)
b = b.astype(np.float32)

# Use standard implementation
c = awesome_cossim_topn(a, b, N, 0.01)

# Use parallel implementation with 4 threads
d = awesome_cossim_topn(a, b, N, 0.01, use_threads=True, n_jobs=4)

# Use standard implementation with 4 threads and with the computation of best_ntop: the value of ntop needed to capture all results above lower_bound
d, best_ntop = awesome_cossim_topn(a, b, N, 0.01, use_threads=True, n_jobs=4, return_best_ntop=True)

You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py

Dependency and Install

Install numpy and cython first before installing this package. Then,

pip install sparse_dot_topn

From version >=0.3.0, we don't proactively support python 2.7. However, you should still be able to install this package in python 2.7. If you encounter gcc compiling issue, please refer these discussions and setup CFLAGS and CXXFLAGS variables

Uninstall

pip uninstall sparse_dot_topn

Local development

python setup.py clean --all
python setup.py develop
pytest
python -m build
cd dist/
pip install sparse_dot_topn-*.tar.gz

Release strategy

From version 0.3.2, we employ Github Actions to build wheels in different OS environment and release automatically. Hopefully this will solve many issues related to installation. The build and publish pipeline is configured in ./github/workflows/wheels.yml. When a new release is neeeded, please follow these steps

  1. Create a test branch with branch name test/x.x.x from main branch.
  2. In test/x.x.x branch, update the version number such as x.x.x.rcxin setup.py, and update changelog in CHANGES.md file.
  3. Git push test/x.x.x branch, then build and publish pipeline will be triggered automatically. New release will be uploaded in PyPI test https://test.pypi.org/project/sparse-dot-topn/.
  4. Please do a sanity check on PyPI test release.
  5. Create a branch on top of the test branch.
  6. Modify the version number by remove the rcx surfix.
  7. Git push, then build and publish pipeline will be triggered automatically. New release will be uploaded to PyPI https://pypi.org/project/sparse-dot-topn
  8. Merge the release branch back to master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparse_dot_topn-0.3.3-cp310-cp310-win_amd64.whl (283.0 kB view details)

Uploaded CPython 3.10Windows x86-64

sparse_dot_topn-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.3-cp310-cp310-macosx_11_0_arm64.whl (281.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

sparse_dot_topn-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl (321.4 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

sparse_dot_topn-0.3.3-cp39-cp39-win_amd64.whl (284.7 kB view details)

Uploaded CPython 3.9Windows x86-64

sparse_dot_topn-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.3-cp39-cp39-macosx_11_0_arm64.whl (278.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

sparse_dot_topn-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl (319.6 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

sparse_dot_topn-0.3.3-cp38-cp38-win_amd64.whl (286.5 kB view details)

Uploaded CPython 3.8Windows x86-64

sparse_dot_topn-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.3-cp38-cp38-macosx_11_0_arm64.whl (272.6 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

sparse_dot_topn-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl (314.7 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

sparse_dot_topn-0.3.3-cp37-cp37m-win_amd64.whl (282.0 kB view details)

Uploaded CPython 3.7mWindows x86-64

sparse_dot_topn-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl (312.7 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

sparse_dot_topn-0.3.3-cp36-cp36m-win_amd64.whl (282.0 kB view details)

Uploaded CPython 3.6mWindows x86-64

sparse_dot_topn-0.3.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.3-cp36-cp36m-macosx_10_9_x86_64.whl (310.6 kB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file sparse_dot_topn-0.3.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a9f75ea5442e09907c69ee8e0ac6d893a638af33c6bfb27fcf5595a76fda7b61
MD5 e842002c4b0c08a1f8c41e455656cd3b
BLAKE2b-256 18760d20838830bdb56e04ba8712b3acc10b8581325668ce5d922ccab81b3b45

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e48c469d07dd8dbee1a7cd95d06eaff0c131d2d34ddffe4ffebe8987744a6f38
MD5 32759d7d64c8a4ad174c92b4feae407b
BLAKE2b-256 dee9ebad4a2c1e9b2e7b2e867fd1cc7e4e5ecb7412e3e822d30edd0cce43e6f7

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 398efe296ab27c8d9ce4604d952590515b01362f2b0457f327beef882f2e3ef8
MD5 7f38a132e3f193812461ed5ce36e606c
BLAKE2b-256 facead7c42f15e6e188aa095a12458316b88ed0ce6e8c24690213d4f2e076ed1

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6e6d4ca53ce8acf0ae0b042fe173a43b0769d394d161712c2f84b705818e8cac
MD5 eea2fd4d12bf6a543e1df1809ef2b112
BLAKE2b-256 01e386828987b5cf46b3f49f459da3ac19ee88145cede68d3e7ac7400efa0701

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 39ab5b7e20662893cad8304fe9d22fc1e7fcf8306c79ef033338679d0d2f0757
MD5 772196c8446e19a526c10c7580b26f34
BLAKE2b-256 e18e4d26512d507d2c7369e23c5d862ef309577779eb972e17fa7269b4dde555

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d93f1d0f11f0580977ee0039f1aa1931bbe211751de3a1cdda535b571587f19
MD5 25dcbbaaab5db52f50d84689a9188c31
BLAKE2b-256 3a4c92376e69dbdba13bbc605d42ee2925c86144c1e0572339872ce9b3749208

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 19bfbb56b440040f4530d5af326494da7528bb537412ef0407b4144b7c0cf6b6
MD5 9632c6c423408a59d5becc7aa6f3c28a
BLAKE2b-256 10ac0dc3bb5aea61b6d2e1a39cd6e0fe1d9b0957ead24549eebc40a9a301d8d9

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 cd10471301eabe3df4b68ac4028f83cd5619501edaff5814e78ac3e88b4b2420
MD5 98f8b580625a9e6cb15e0f5b0ddbd9ca
BLAKE2b-256 bd6425be47c7cc88ad4548dcc9c0b7672e442a903dfce278a062ac481e3d236c

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ba6a3ae8b4c0cbbdd945681808bf99247addf1215da18a10190d222dfcb8a895
MD5 81dbe8af952ce070f0d73a906052a8f7
BLAKE2b-256 88a3156bda4ddffd3141bb325b301b52237e16b131ed3eb2b67aaf7104f29e5e

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 37fbaeede1e5c9356982e96da218b2dee2c9fe2b8d8397c2e6353e8c202d5dac
MD5 37d1a64c1fd2ff2acef62bd429e97786
BLAKE2b-256 9f47c111b366559b025faf3e4068d1cc81df406be5aa44d2456808714c511f54

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1100dfa4f6f395d3ca2a3a6b3ecab43696c572ec8649d6b40cdc24a82c9f0293
MD5 c98dd7a2e87eb9ad2b79131d527ede4d
BLAKE2b-256 1649f295adf4932bb6f13ec3a4b3c9e8a1626de6cc6097928d15e9ad3d314637

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a0eea0eba0ebc90ef0fb7fa045b0395793ff724deb3ff54fde5daf8a6ff0f4d1
MD5 5f12e2df316f3c8158f682993cc45b36
BLAKE2b-256 b5996f08631d5d6df31271f597f551a6f95262771d0648823e2ee1b2d9f11411

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d75d2978523be30d032158b01d0960b0f60daaf2f18c1fe9fdefb171a15c7779
MD5 ad3a86068df7db8bd19903cbade0601c
BLAKE2b-256 a576514afce4cf720dafad4899f3f690d6f8332d01d9f5d5faf4ea1a762a31c4

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 01c92add0b4a8f683af2abc0a097c7d1c2ff18f9783ca83cf0acba50c5fcd8c8
MD5 e1d7cf20444f2b4dad1b32c76f015646
BLAKE2b-256 991a42649f75e088eb13771d5594ce117aa1a24839a4f3f35d0717fb11d5846b

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8a5e32905918ea8e8f24dc48310815bc5a6321bef72f63a982b0ce2fc2f2816f
MD5 0236fc43ba2e05b4cc8cac83ae80b5aa
BLAKE2b-256 448e8a92c5950a7080d065232c8fc17797343735cd44b8e3dc17e6f94c4a8e93

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 a24c3b2307dcb024efed438e8c933a5bce3f85bad888619f6c7aa891b81bbb87
MD5 fef2635100e14ddb2732cc5bab092493
BLAKE2b-256 ebbfa2c0ce9d769159e2fbfc20b447672a86fd043d614a235e63f77f25f480b3

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 075bfde017a55b91e380e146dcdd285fd12303f681bc63290f8db533297619b1
MD5 31d0714c10c82e7bcf266dabd3a3cebb
BLAKE2b-256 fc88870f2ae1cf0aab44e33a94ab2ab8df2fe2c8757dab0492c1ff50ef72d88c

See more details on using hashes here.

File details

Details for the file sparse_dot_topn-0.3.3-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for sparse_dot_topn-0.3.3-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6aaad2f1a233612d65ad2e3740143d14138efa8ea1eced94ceb0b003c7a05fb8
MD5 94ea8d6a531c2f68e6b84c73f050e1c4
BLAKE2b-256 27834516451df07e0a5208e7096ebdf504085c78eb0884ebed52ac20e916bfda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page