Skip to main content

NVIDIA cuSPARSELt

Project description

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:

\begin{equation*} D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \end{equation*}

where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads

Provide Feedback: Math-Libs-Feedback@nvidia.com

Examples: cuSPARSELt Example 1, cuSPARSELt Example 2

Blog post:

Key Features

  • NVIDIA Sparse MMA tensor core support

  • Mixed-precision computation support:

    Input A/B

    Input C

    Output D

    Compute

    Block scaled

    Support SM arch

    FP32

    FP32

    FP32

    FP32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    BF16

    BF16

    FP32

    FP16

    FP16

    FP16

    FP32

    FP16

    FP16

    FP16

    FP16

    No

    9.0

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    INT8

    INT8

    INT8

    INT32

    No

    8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1

    INT32

    INT32

    FP16

    FP16

    BF16

    BF16

    E4M3

    FP16

    E4M3

    FP32

    No

    9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E5M2

    FP16

    E5M2

    FP32

    No

    9.0, 10.0, 10.1 11.0, 12.0, 12.1

    BF16

    E5M2

    FP16

    FP16

    BF16

    BF16

    FP32

    FP32

    E4M3

    FP16

    E4M3

    FP32

    A/B/D_OUT_SCALE = VEC64_UE8M0

    D_SCALE = 32F

    10.0, 10.1, 11.0 12.0, 12.1

    BF16

    E4M3

    FP16

    FP16

    A/B_SCALE = VEC64_UE8M0

    BF16

    BF16

    FP32

    FP32

    E2M1

    FP16

    E2M1

    FP32

    A/B/D_SCALE = VEC32_UE4M3

    D_SCALE = 32F

    10.0, 10.1, 11.0 12.0, 12.1

    BF16

    E2M1

    FP16

    FP16

    A/B_SCALE = VEC32_UE4M3

    BF16

    BF16

    FP32

    FP32

  • Matrix pruning and compression functionalities

  • Activation functions, bias vector, and output scaling

  • Batched computation (multiple matrices in a single run)

  • GEMM Split-K mode

  • Auto-tuning functionality (see cusparseLtMatmulSearch())

  • NVTX ranging and Logging functionalities

Support

  • Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.1 (for CTK 12), SM 11.0 (for CTK 13), SM 12.0, SM 12.1

  • Supported CPU architectures and operating systems:

OS

CPU archs

Windows

x86_64

Linux

x86_64, Arm64

Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

Installation

The cuSPARSELt wheel can be installed as follows:

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl (225.7 MB view details)

Uploaded Python 3Windows x86-64

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 2607ec058d53967c9caf0b7a3904ced34bbceaf7944cf9fef6d7f4ec6dab5e3a
MD5 d5f8cd23ed53e5cce28cb8a9e58d7709
BLAKE2b-256 64f59eefe50ee49fda0657aaa061a56600a519dbc1c772d0df701f80e676c818

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cd1b1dc9e1ad31ea3353c1f985e2bd6f9e7ae0e797d7e6ce879d7b2ace5e80e8
MD5 4c02b8c4e7d06d2cbfeb9305dab5c522
BLAKE2b-256 bb14e46964290aa587cb9fb7df20efdc60528ddd00d291ccffec47617fb06ca3

See more details on using hashes here.

File details

Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5c72f727722f74762380e5f8755557c788b26d8fdcc49df1641c1b08e16d256c
MD5 21432ed00954d8546800e46b302075ed
BLAKE2b-256 fdf8a809966c96e824b92df09ee3b7032442f5e975d873d7dadfef818d527f48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page