Skip to main content

Sparse convolution in python using Toeplitz convolution matrix multiplication.

Project description

sparse_convolution

Sparse 2D convolution in Python via Toeplitz matrix methods.

Fast when the kernel is small, the input is sparse, and/or many arrays share the same kernel.

Install

pip install sparse_convolution

Or from source:

git clone https://github.com/RichieHakim/sparse_convolution
cd sparse_convolution
pip install -e .

Usage

Single image

import numpy as np
import scipy.sparse
import sparse_convolution as sc

x = scipy.sparse.random(100, 100, density=0.01)
k = np.random.rand(5, 5)

conv = sc.Toeplitz_convolution2d(x_shape=x.shape, k=k, mode='same')
out = conv(x=x, batching=False).toarray()

Batched

Input: (n_images, H * W) sparse matrix. Output: (n_images, H_out * W_out).

x_batch = scipy.sparse.vstack([
    scipy.sparse.random(100, 100, density=0.01).reshape(1, -1)
    for _ in range(50)
]).tocsr()

conv = sc.Toeplitz_convolution2d(x_shape=(100, 100), k=k, mode='same')
out = conv(x=x_batch, batching=True)

Methods and backends

Four methods, each with selectable backends:

Method numpy numba torch
direct n/a yes n/a
precomputed yes yes yes
lazy yes n/a yes
gather_scatter yes yes yes
  • direct (default): Batch-parallel scatter convolution with thread-local dense buffers (numba only). For each image in parallel, scatters kernel-weighted input values into an L2-cache-sized accumulator buffer, then extracts nonzeros into CSR format. Uses a two-phase approach: a lightweight boolean counting pass (1-byte flags, no float arithmetic) determines exact output sizes, then the scatter pass writes directly to right-sized arrays with zero waste. Interior pixels (~92-100%) skip bounds checking entirely via precomputed safe regions. O(nnz × K) per image with no init overhead. Fastest method across nearly all configurations. Requires numba.
  • precomputed: Builds a sparse Toeplitz matrix at init; fast batched matmul. Best for large batches with the same kernel when numba is not available.
  • lazy: COO broadcasting, no init cost. Best for very sparse inputs with small batches.
  • gather_scatter: Per-kernel-position scatter into a dense accumulator. General-purpose method for sparse batched inputs.

Backend selection:

  • numpy: scipy/numpy ops. Always available.
  • numba: JIT-compiled parallel loops. Fastest on CPU for batched inputs. Requires numba.
  • torch: PyTorch ops with optional GPU. Requires torch.
conv = sc.Toeplitz_convolution2d(
    x_shape=(100, 100),
    k=k,
    mode='same',
    method='direct',       # default
    backend='numba',       # auto-selected for direct
)

If backend=None (default), auto-selects numba for direct and gather_scatter (if installed), numpy otherwise.

References

Benchmarks

All benchmarks run on CPU with 1s minimum measurement time per configuration (median reported). Nine method+backend combinations compared across six scaling sweeps.

Scaling overview

Six scaling sweeps varying batch size, density, image size, and kernel size. direct+numba (brown stars) is the fastest method in nearly all regimes.

Scaling overview

Grid search: fastest method per configuration

Each cell shows the winning method and total time (init + call) for that batch size × density combination. direct+numba wins 28 of 36 configurations with an average 4.75× speedup over the second-fastest method.

Grid winners

Individual scaling curves

Batch size scaling — 100×100, 5×5, density=0.01

Batch scaling

Density scaling — 100×100, 5×5, batch=100

Density scaling

Image size scaling — 5×5, density=0.01, batch=50

Image size scaling

Kernel size scaling — 100×100, density=0.01, batch=50

Kernel size scaling

Batch scaling — high density (0.1)

Batch scaling high density

Batch scaling — very sparse (density=0.001, 200×200)

Batch scaling very sparse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_convolution-0.4.0.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparse_convolution-0.4.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file sparse_convolution-0.4.0.tar.gz.

File metadata

  • Download URL: sparse_convolution-0.4.0.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparse_convolution-0.4.0.tar.gz
Algorithm Hash digest
SHA256 8675b8e4023f653df98a8b09bc3fec778e916c4398688d321b7af811a0b0c5cc
MD5 fceb6c7e81fb1fa84352cbfcd870535a
BLAKE2b-256 08fac92cfafd3a98511665c0014f0077272352c21fbb09e6a8b024082acc401c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparse_convolution-0.4.0.tar.gz:

Publisher: pypi_release.yml on RichieHakim/sparse_convolution

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparse_convolution-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sparse_convolution-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a49df61e2d7d359a52129fa09eb0f12db8f4579673fd98c6e748a153c241b80
MD5 dbc3aa6dd358a7cc5ab920acd96a92d4
BLAKE2b-256 ff8a665554aa104f8c1ccfcbec3d3da8047523ef7f4e6e1c30fa976c2cd6cf98

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparse_convolution-0.4.0-py3-none-any.whl:

Publisher: pypi_release.yml on RichieHakim/sparse_convolution

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page