nvidia-cusparselt-cu12

NVIDIA cuSPARSELt

These details have not been verified by PyPI

Project links

Homepage

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

NVIDIA Sparse MMA tensor core support

Mixed-precision computation support:

Input A/B

Input C

Output D

Compute

Block scaled

Support SM arch

FP32

FP32

FP32

FP32

No

8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

BF16

BF16

BF16

FP32

FP16

FP16

FP16

FP32

FP16

FP16

FP16

FP16

No

9.0

INT8

INT8

INT8

INT32

No

8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

INT32

INT32

FP16

FP16

BF16

BF16

INT8

INT8

INT8

INT32

No

8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

INT32

INT32

FP16

FP16

BF16

BF16

E4M3

FP16

E4M3

FP32

No

9.0, 10.0, 10.1 11.0, 12.0, 12.1

BF16

E4M3

FP16

FP16

BF16

BF16

FP32

FP32

E5M2

FP16

E5M2

FP32

No

9.0, 10.0, 10.1 11.0, 12.0, 12.1

BF16

E5M2

FP16

FP16

BF16

BF16

FP32

FP32

E4M3

FP16

E4M3

FP32

A/B/D_OUT_SCALE = VEC64_UE8M0

D_SCALE = 32F

10.0, 10.1, 11.0 12.0, 12.1

BF16

E4M3

FP16

FP16

A/B_SCALE = VEC64_UE8M0

BF16

BF16

FP32

FP32

E2M1

FP16

E2M1

FP32

A/B/D_SCALE = VEC32_UE4M3

D_SCALE = 32F

10.0, 10.1, 11.0 12.0, 12.1

BF16

E2M1

FP16

FP16

A/B_SCALE = VEC32_UE4M3

BF16

BF16

FP32

FP32

Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities

Input A/B	Input C	Output D	Compute	Block scaled	Support SM arch
FP32	FP32	FP32	FP32	No	8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16	BF16	BF16	FP32
FP16	FP16	FP16	FP32
FP16	FP16	FP16	FP16	No	9.0
INT8	INT8	INT8	INT32	No	8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
INT32	INT32
FP16	FP16
BF16	BF16
INT8	INT8	INT8	INT32	No	8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
INT32	INT32
FP16	FP16
BF16	BF16
E4M3	FP16	E4M3	FP32	No	9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16	E4M3
FP16	FP16
BF16	BF16
FP32	FP32
E5M2	FP16	E5M2	FP32	No	9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16	E5M2
FP16	FP16
BF16	BF16
FP32	FP32
E4M3	FP16	E4M3	FP32	A/B/D_OUT_SCALE = VEC64_UE8M0 D_SCALE = 32F	10.0, 10.1, 11.0 12.0, 12.1
BF16	E4M3
FP16	FP16	A/B_SCALE = VEC64_UE8M0
BF16	BF16
FP32	FP32
E2M1	FP16	E2M1	FP32	A/B/D_SCALE = VEC32_UE4M3 D_SCALE = 32F	10.0, 10.1, 11.0 12.0, 12.1
BF16	E2M1
FP16	FP16	A/B_SCALE = VEC32_UE4M3
BF16	BF16
FP32	FP32

Support

Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.1 (for CTK 12), SM 11.0 (for CTK 13), SM 12.0, SM 12.1
Supported CPU architectures and operating systems:

OS	CPU archs
Windows	x86_64
Linux	x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.8.1

Sep 5, 2025

0.8.0

Aug 13, 2025

0.7.1

Feb 26, 2025

0.7.0

Jan 31, 2025

0.6.3

Oct 15, 2024

0.6.2

Jul 23, 2024

0.0.1.dev1 pre-release

Apr 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl (225.7 MB view details)

Uploaded Sep 5, 2025 Python 3Windows x86-64

nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl (239.3 MB view details)

Uploaded Sep 5, 2025 Python 3

nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl (236.0 MB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl.

File metadata

Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl
Upload date: Sep 5, 2025
Size: 225.7 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.18

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`2607ec058d53967c9caf0b7a3904ced34bbceaf7944cf9fef6d7f4ec6dab5e3a`
MD5	`d5f8cd23ed53e5cce28cb8a9e58d7709`
BLAKE2b-256	`64f59eefe50ee49fda0657aaa061a56600a519dbc1c772d0df701f80e676c818`

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl.

File metadata

Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl
Upload date: Sep 5, 2025
Size: 239.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.18

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`cd1b1dc9e1ad31ea3353c1f985e2bd6f9e7ae0e797d7e6ce879d7b2ace5e80e8`
MD5	`4c02b8c4e7d06d2cbfeb9305dab5c522`
BLAKE2b-256	`bb14e46964290aa587cb9fb7df20efdc60528ddd00d291ccffec47617fb06ca3`

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl.

File metadata

Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl
Upload date: Sep 5, 2025
Size: 236.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.18

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`5c72f727722f74762380e5f8755557c788b26d8fdcc49df1641c1b08e16d256c`
MD5	`21432ed00954d8546800e46b302075ed`
BLAKE2b-256	`fdf8a809966c96e824b92df09ee3b7032442f5e975d873d7dadfef818d527f48`

See more details on using hashes here.

nvidia-cusparselt-cu12 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Key Features

Support

Documentation

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes