Skip to main content

This package boosts a sparse matrix multiplication followed by selecting the top-n multiplication

Project description

sparse_dot_topn:

sparse_dot_topn provides a fast way to performing a sparse matrix multiplication followed by top-n multiplication result selection.

Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this package, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same use with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption.

This package is made by ING Wholesale Banking Advanced Analytics team. This blog or this blog explains how we implement it.

Example

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from sparse_dot_topn import awesome_cossim_topn

N = 10
a = rand(100, 1000000, density=0.005, format='csr')
b = rand(1000000, 200, density=0.005, format='csr')

# Default precision type is np.float64, but you can down cast to have a small memory footprint and faster execution
# Remark : These are the only 2 types supported now, since we assume that float16 will be difficult to implement and will be slower, because C doesn't support a 16-bit float type on most PCs
a = a.astype(np.float32)
b = b.astype(np.float32)

# Use standard implementation
c = awesome_cossim_topn(a, b, N, 0.01)

# Use parallel implementation with 4 threads
d = awesome_cossim_topn(a, b, N, 0.01, use_threads=True, n_jobs=4)

# Use standard implementation with 4 threads and with the computation of best_ntop: the value of ntop needed to capture all results above lower_bound
d, best_ntop = awesome_cossim_topn(a, b, N, 0.01, use_threads=True, n_jobs=4, return_best_ntop=True)

You can also find code which compares our boosting method with calling scipy+numpy function directly in example/comparison.py

Dependency and Install

Install numpy and cython first before installing this package. Then,

pip install sparse_dot_topn

From version >=0.3.0, we don't proactively support python 2.7. However, you should still be able to install this package in python 2.7. If you encounter gcc compiling issue, please refer these discussions and setup CFLAGS and CXXFLAGS variables

Uninstall

pip uninstall sparse_dot_topn

Local development

python setup.py clean --all
python setup.py develop
pytest
python -m build
cd dist/
pip install sparse_dot_topn-*.tar.gz

Release strategy

From version 0.3.2, we employ Github Actions to build wheels in different OS and Python environments with cibuildwheel, and release automatically. Hopefully this will solve many issues related to installation. The build and publish pipeline is configured in ./github/workflows/wheels.yml. When a new release is neeeded, please follow these steps

  1. Create a test branch with branch name test/x.x.x from main branch.
  2. In test/x.x.x branch, update the version number such as x.x.x.rcx (e.g. 0.3.4.rc0) in setup.py, and update changelog in CHANGES.md file.
  3. Git push test/x.x.x branch, then build and publish pipeline will be triggered automatically. New release will be uploaded in PyPI test https://test.pypi.org/project/sparse-dot-topn/.
  4. Please do a sanity check on PyPI test release.
  5. Update the changelog in CHANGES.md
  6. Create a branch on top of the test branch.
  7. Modify the version number by remove the rcx suffix in setup.py.
  8. Git push, then build and publish pipeline will be triggered automatically. New release will be uploaded to PyPI https://pypi.org/project/sparse-dot-topn
  9. Merge the release branch back to master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sparse_dot_topn-0.3.4-cp311-cp311-win_amd64.whl (281.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

sparse_dot_topn-0.3.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp311-cp311-macosx_11_0_arm64.whl (280.3 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

sparse_dot_topn-0.3.4-cp311-cp311-macosx_10_9_x86_64.whl (318.5 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

sparse_dot_topn-0.3.4-cp310-cp310-win_amd64.whl (284.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

sparse_dot_topn-0.3.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp310-cp310-macosx_11_0_arm64.whl (282.7 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

sparse_dot_topn-0.3.4-cp310-cp310-macosx_10_9_x86_64.whl (323.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

sparse_dot_topn-0.3.4-cp39-cp39-win_amd64.whl (285.6 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

sparse_dot_topn-0.3.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp39-cp39-macosx_11_0_arm64.whl (282.1 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

sparse_dot_topn-0.3.4-cp39-cp39-macosx_10_9_x86_64.whl (322.0 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

sparse_dot_topn-0.3.4-cp38-cp38-win_amd64.whl (287.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

sparse_dot_topn-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp38-cp38-macosx_11_0_arm64.whl (277.3 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

sparse_dot_topn-0.3.4-cp38-cp38-macosx_10_9_x86_64.whl (316.8 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

sparse_dot_topn-0.3.4-cp37-cp37m-win_amd64.whl (283.0 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

sparse_dot_topn-0.3.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp37-cp37m-macosx_10_9_x86_64.whl (317.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

sparse_dot_topn-0.3.4-cp36-cp36m-win_amd64.whl (283.0 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

sparse_dot_topn-0.3.4-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

sparse_dot_topn-0.3.4-cp36-cp36m-macosx_10_9_x86_64.whl (314.8 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page