nvidia-cusparselt-cu12

NVIDIA cuSPARSELt

These details have not been verified by PyPI

Project links

Homepage

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta, scale\) are scalars.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

NVIDIA Sparse MMA tensor core support

Mixed-precision computation support:

Input A/B

Input C

Output D

Compute

FP32

FP32

FP32

FP32

FP16

FP16

FP16

FP32

FP16

BF16

BF16

BF16

FP32

INT8

INT8

INT8

INT32

INT32

INT32

FP16

FP16

BF16

BF16

E4M3

FP16

E4M3

FP32

BF16

E4M3

FP16

FP16

BF16

BF16

FP32

FP32

E5M2

FP16

E5M2

FP32

BF16

E5M2

FP16

FP16

BF16

BF16

FP32

FP32

Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities

Input A/B	Input C	Output D	Compute
FP32	FP32	FP32	FP32
FP16	FP16	FP16	FP32
FP16
BF16	BF16	BF16	FP32
INT8	INT8	INT8	INT32
INT32	INT32
FP16	FP16
BF16	BF16
E4M3	FP16	E4M3	FP32
BF16	E4M3
FP16	FP16
BF16	BF16
FP32	FP32
E5M2	FP16	E5M2	FP32
BF16	E5M2
FP16	FP16
BF16	BF16
FP32	FP32

Support

Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9, SM 9.0
Supported CPU architectures and operating systems:

OS	CPU archs
Windows	x86_64
Linux	x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install cusparselt-cuXX

where XX is the CUDA major version (currently CUDA 12 only is supported).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.6.2

Jul 23, 2024

0.0.1.dev1 pre-release

Apr 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

nvidia_cusparselt_cu12-0.6.2-py3-none-win_amd64.whl (148.8 MB view hashes)

Uploaded Jul 23, 2024 Python 3 Windows x86-64

nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl (150.1 MB view hashes)

Uploaded Jul 23, 2024 Python 3

nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_aarch64.whl (149.4 MB view hashes)

Uploaded Jul 23, 2024 Python 3

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-win_amd64.whl

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`0057c91d230703924c0422feabe4ce768841f9b4b44d28586b6f6d2eb86fbe70`
MD5	`f61eb02aaead7e1b7b5803d287bd7cb8`
BLAKE2b-256	`568f2c33082238b6c5e783a877dc8786ab62619e3e6171c083bd3bba6e3fe75e`

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`df2c24502fd76ebafe7457dbc4716b2fec071aabaed4fb7691a201cde03704d9`
MD5	`a70d0fe7cd4f14bcfcb36155f42cd130`
BLAKE2b-256	`78a8bcbb63b53a4b1234feeafb65544ee55495e1bb37ec31b999b963cbccfd1d`

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_aarch64.whl

Hashes for nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`067a7f6d03ea0d4841c85f0c6f1991c5dda98211f6302cb83a4ab234ee95bef8`
MD5	`ddaf3383e24d67aa1691f79a35d3f9c6`
BLAKE2b-256	`988e675498726c605c9441cf46653bd29cb1b8666da1fb1469ffa25f67f20c58`