NVIDIA cuSPARSELt
Project description
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta, scale\) are scalars.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
Input A/B
Input C
Output D
Compute
FP32
FP32
FP32
FP32
FP16
FP16
FP16
FP32
FP16
BF16
BF16
BF16
FP32
INT8
INT8
INT8
INT32
INT32
INT32
FP16
FP16
BF16
BF16
E4M3
FP16
E4M3
FP32
BF16
E4M3
FP16
FP16
BF16
BF16
FP32
FP32
E5M2
FP16
E5M2
FP32
BF16
E5M2
FP16
FP16
BF16
BF16
FP32
FP32
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support
Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9, SM 9.0
Supported CPU architectures and operating systems:
OS |
CPU archs |
---|---|
Windows |
x86_64 |
Linux |
x86_64, Arm64 |
Documentation
Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.
Installation
The cuSPARSELt wheel can be installed as follows:
pip install nvidia-cusparselt-cuXX
where XX is the CUDA major version (currently CUDA 12 only is supported).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for nvidia_cusparselt_cu12-0.6.3-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b325bcbd9b754ba43df5a311488fca11a6b5dc3d11df4d190c000cf1a0765c7 |
|
MD5 | 9aee2464322ac34bc24cf9cdd49e27e9 |
|
BLAKE2b-256 | 463e9e1e394a02a06f694be2c97bbe47288bb7c90ea84c7e9cf88f7b28afe165 |
Hashes for nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5c8a26c36445dd2e6812f1177978a24e2d37cacce7e090f297a688d1ec44f46 |
|
MD5 | 7f9f32cf1080300ace5f4fe061d2e3dd |
|
BLAKE2b-256 | 3b9a72ef35b399b0e183bc2e8f6f558036922d453c4d8237dab26c666a04244b |
Hashes for nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8371549623ba601a06322af2133c4a44350575f5a3108fb75f3ef20b822ad5f1 |
|
MD5 | 414d6b93245bd57f24646f4a19f59669 |
|
BLAKE2b-256 | 62da4de092c61c6dea1fc9c936e69308a02531d122e12f1f649825934ad651b5 |