Skip to main content

NVIDIA cuSPARSELt

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

  • NVIDIA Sparse MMA tensor core support

  • Mixed-precision computation support:

    Input A/B

    Input C

    Output D

    Compute

    Block scaled

    Support SM arch

    FP32

    FP32

    FP32

    FP32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    BF16

    BF16

    FP32

    FP16

    FP16

    FP16

    FP32

    FP16

    FP16

    FP16

    FP16

    No

    9.0

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    E4M3

    FP16

    E4M3

    FP32

    No

    9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E5M2

    FP16

    E5M2

    FP32

    No

    9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    E5M2

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E4M3

    FP16

    E4M3

    FP32

    A/B/D_OUT_SCALE = VEC64_UE8M0

    D_SCALE = 32F

    10.0, 10.1, 11.0 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    A/B_SCALE = VEC64_UE8M0

    BF16

    BF16

    FP32

    FP32

    E2M1

    FP16

    E2M1

    FP32

    A/B/D_SCALE = VEC32_UE4M3

    D_SCALE = 32F

    10.0, 10.1, 11.0 12.0, 12.1

    BF16

    E2M1

    FP16

    FP16

    A/B_SCALE = VEC32_UE4M3

    BF16

    BF16

    FP32

    FP32

  • Matrix pruning and compression functionalities

  • Activation functions, bias vector, and output scaling

  • Batched computation (multiple matrices in a single run)

  • GEMM Split-K mode

  • Auto-tuning functionality (see cusparseLtMatmulSearch())

  • NVTX ranging and Logging functionalities

Support

  • Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.1 (for CTK 12), SM 11.0 (for CTK 13), SM 12.0, SM 12.1

  • Supported CPU architectures and operating systems:

OS

CPU archs

Windows

x86_64

Linux

x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cusparselt_cu13-0.8.1-py3-none-win_amd64.whl (156.9 MB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file nvidia_cusparselt_cu13-0.8.1-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.8.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 dccbd362f91a7b9024d1f55ee9f548ac065027ff15d8c8b0db889ab3a8f31215
MD5 9761d65ff55cbd70eb190267a5758382
BLAKE2b-256 3183f3647ce26916c94a6ca4ff1810623e2c405cff2dea6e78d29516b2514df9

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.8.1-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.8.1-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 786ce87568c303fadb5afcc7102d454cd3040d75f6f8626f5db460d1871f4dd0
MD5 9f134a0dd65345b9307a2c4a7533a7fd
BLAKE2b-256 347d2661f2fb3ac4302f3a246f5fc030213ac60c1fe0bce84f9783dbd831dbb7

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu13-0.8.1-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu13-0.8.1-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4dca476c50bf4780d46cd0bfbd82e2bc10a08e4fef7950917ce8d7578d22a23f
MD5 972fc54f4136dddc5b3c865aa461c8c1
BLAKE2b-256 46e1cdc1797eadf82d3a9a575a19b33fdc871a97edbec42c00b5b5e914f4aff4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page