Skip to main content

NVIDIA cuSPARSELt

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

  • NVIDIA Sparse MMA tensor core support

  • Mixed-precision computation support:

    Input A/B

    Input C

    Output D

    Compute

    Block scaled

    Support SM arch

    FP32

    FP32

    FP32

    FP32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    BF16

    BF16

    FP32

    FP16

    FP16

    FP16

    FP32

    FP16

    FP16

    FP16

    FP16

    No

    9.0

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 11.0 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 11.0 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    E4M3

    FP16

    E4M3

    FP32

    No

    9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E5M2

    FP16

    E5M2

    FP32

    No

    9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    E5M2

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E4M3

    FP16

    E4M3

    FP32

    A/B/D_OUT_SCALE = VEC64_UE8M0

    D_SCALE = 32F

    10.0, 10.3, 11.0 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    A/B_SCALE = VEC64_UE8M0

    BF16

    BF16

    FP32

    FP32

    E2M1

    FP16

    E2M1

    FP32

    A/B/D_SCALE = VEC32_UE4M3

    D_SCALE = 32F

    10.0, 10.3, 11.0 12.0, 12.1

    BF16

    E2M1

    FP16

    FP16

    A/B_SCALE = VEC32_UE4M3

    BF16

    BF16

    FP32

    FP32

  • Matrix pruning and compression functionalities

  • Activation functions, bias vector, and output scaling

  • Batched computation (multiple matrices in a single run)

  • GEMM Split-K mode

  • Auto-tuning functionality (see cusparseLtMatmulSearch() <cusparseLtMatmulSearch-label>)

  • NVTX ranging and Logging functionalities

Support

  • Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.3, SM 11.0, SM 12.0, SM 12.1

  • Supported CPU architectures and operating systems:

OS

CPU archs

Windows

x86_64

Linux

x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cusparselt_cu13-0.9.1-py3-none-win_amd64.whl (157.9 MB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file nvidia_cusparselt_cu13-0.9.1-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 a7dd536f0fd2ea7c65395a6d63568bcd98c2522349112e1e61f8b1d19d559528
MD5 210f0663a55f0a15d3786dc29098017c
BLAKE2b-256 a10222ee8243eb74188a717e151ca6c6429daf4a4b64fdf389e82fa2c6c21183

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.9.1-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.1-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 514a84d71b4edb8609108253d3d9ea283542162a0861787f600b704ed663e58c
MD5 e66e1ccab19e6375662cd21fe038d92a
BLAKE2b-256 c6904a85f423459c4191c65d5ac2fbb2f3de43c86c75fb01795dfbb0779e8009

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.9.1-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.1-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 49e54b7cecb049b039381d4cb8389afd2f18d83cdca285ba8231ee0ff4341dc1
MD5 4068bdeccdac9002b5bddc04639e92a8
BLAKE2b-256 6c17820aec6a5a3de1eff9e45739f0bc8f669be865b007fe9fa68dbc51321ed3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page