Skip to main content

NVIDIA cuSPARSELt

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

  • NVIDIA Sparse MMA tensor core support

  • Mixed-precision computation support:

    Input A/B

    Input C

    Output D

    Compute

    Block scaled

    Support SM arch

    FP32

    FP32

    FP32

    FP32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    BF16

    BF16

    FP32

    FP16

    FP16

    FP16

    FP32

    FP16

    FP16

    FP16

    FP16

    No

    9.0

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 11.0 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 11.0 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    E4M3

    FP16

    E4M3

    FP32

    No

    9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E5M2

    FP16

    E5M2

    FP32

    No

    9.0, 10.0, 10.3 11.0, 12.0, 12.1

    BF16

    E5M2

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E4M3

    FP16

    E4M3

    FP32

    A/B/D_OUT_SCALE = VEC64_UE8M0

    D_SCALE = 32F

    10.0, 10.3, 11.0 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    A/B_SCALE = VEC64_UE8M0

    BF16

    BF16

    FP32

    FP32

    E2M1

    FP16

    E2M1

    FP32

    A/B/D_SCALE = VEC32_UE4M3

    D_SCALE = 32F

    10.0, 10.3, 11.0 12.0, 12.1

    BF16

    E2M1

    FP16

    FP16

    A/B_SCALE = VEC32_UE4M3

    BF16

    BF16

    FP32

    FP32

  • Matrix pruning and compression functionalities

  • Activation functions, bias vector, and output scaling

  • Batched computation (multiple matrices in a single run)

  • GEMM Split-K mode

  • Auto-tuning functionality (see cusparseLtMatmulSearch() <cusparseLtMatmulSearch-label>)

  • NVTX ranging and Logging functionalities

Support

  • Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.3, SM 11.0, SM 12.0, SM 12.1

  • Supported CPU architectures and operating systems:

OS

CPU archs

Windows

x86_64

Linux

x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cusparselt_cu13-0.9.0-py3-none-win_amd64.whl (154.3 MB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file nvidia_cusparselt_cu13-0.9.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4cb5e609fb29ed2271c693ba22048d39fa4d354d6b69bc48e4c228aadd8a4261
MD5 452e091cf4ee35ae23742393c9bd971e
BLAKE2b-256 27309c1d9ecdfb9d31e5a4bb7108c1b712c468db5629c68e9ee9c26c20789780

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.9.0-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.0-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ff93d61138185dd8ba3ee5ac8d508f44241fd5d255d625fb63a313a73d476cdd
MD5 5672b7428662bb85e1d432fff51f78f2
BLAKE2b-256 16453f992a8e12b64a83f63f98ce87e84b5b8c14caf482c42eeedcb82e94df39

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.9.0-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.9.0-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b6ac6c68b57949308c62d42f12e1701b2f95c6d611935b52ca826391a6d8d60e
MD5 ba7277d7c80bb0a11b5beb0d0f526250
BLAKE2b-256 5a63a75afcfd223f30c9c29ce6933a6b99aac1a3e22bbe47c1c1a62cf4255fe2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page