NVIDIA cuSPARSELt
Project description
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a structured sparse matrix with 50% sparsity ratio:
where \(op(A)/op(B)\) refers to in-place operations such as transpose/non-transpose, and \(alpha, beta\) are scalars or vectors.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads
Provide Feedback: Math-Libs-Feedback@nvidia.com
Examples: cuSPARSELt Example 1, cuSPARSELt Example 2
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture
Key Features
NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:
Input A/B
Input C
Output D
Compute
Block scaled
Support SM arch
FP32
FP32
FP32
FP32
No
8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16
BF16
BF16
FP32
FP16
FP16
FP16
FP32
FP16
FP16
FP16
FP16
No
9.0
INT8
INT8
INT8
INT32
No
8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
INT32
INT32
FP16
FP16
BF16
BF16
INT8
INT8
INT8
INT32
No
8.0, 8.6, 8.7 9.0, 10.0, 10.1 11.0, 12.0, 12.1
INT32
INT32
FP16
FP16
BF16
BF16
E4M3
FP16
E4M3
FP32
No
9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16
E4M3
FP16
FP16
BF16
BF16
FP32
FP32
E5M2
FP16
E5M2
FP32
No
9.0, 10.0, 10.1 11.0, 12.0, 12.1
BF16
E5M2
FP16
FP16
BF16
BF16
FP32
FP32
E4M3
FP16
E4M3
FP32
A/B/D_OUT_SCALE = VEC64_UE8M0
D_SCALE = 32F
10.0, 10.1, 11.0 12.0, 12.1
BF16
E4M3
FP16
FP16
A/B_SCALE = VEC64_UE8M0
BF16
BF16
FP32
FP32
E2M1
FP16
E2M1
FP32
A/B/D_SCALE = VEC32_UE4M3
D_SCALE = 32F
10.0, 10.1, 11.0 12.0, 12.1
BF16
E2M1
FP16
FP16
A/B_SCALE = VEC32_UE4M3
BF16
BF16
FP32
FP32
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities
Support
Supported SM Architectures: SM 8.0, SM 8.6, SM 8.7, SM 8.9, SM 9.0, SM 10.0, SM 10.1 (for CTK 12), SM 11.0 (for CTK 13), SM 12.0, SM 12.1
Supported CPU architectures and operating systems:
OS |
CPU archs |
|---|---|
Windows |
x86_64 |
Linux |
x86_64, Arm64 |
Documentation
Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.
Installation
The cuSPARSELt wheel can be installed as follows:
pip install nvidia-cusparselt-cuXX
where XX is the CUDA major version.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl.
File metadata
- Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-win_amd64.whl
- Upload date:
- Size: 225.7 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2607ec058d53967c9caf0b7a3904ced34bbceaf7944cf9fef6d7f4ec6dab5e3a
|
|
| MD5 |
d5f8cd23ed53e5cce28cb8a9e58d7709
|
|
| BLAKE2b-256 |
64f59eefe50ee49fda0657aaa061a56600a519dbc1c772d0df701f80e676c818
|
File details
Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 239.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd1b1dc9e1ad31ea3353c1f985e2bd6f9e7ae0e797d7e6ce879d7b2ace5e80e8
|
|
| MD5 |
4c02b8c4e7d06d2cbfeb9305dab5c522
|
|
| BLAKE2b-256 |
bb14e46964290aa587cb9fb7df20efdc60528ddd00d291ccffec47617fb06ca3
|
File details
Details for the file nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl.
File metadata
- Download URL: nvidia_cusparselt_cu12-0.8.1-py3-none-manylinux2014_aarch64.whl
- Upload date:
- Size: 236.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c72f727722f74762380e5f8755557c788b26d8fdcc49df1641c1b08e16d256c
|
|
| MD5 |
21432ed00954d8546800e46b302075ed
|
|
| BLAKE2b-256 |
fdf8a809966c96e824b92df09ee3b7032442f5e975d873d7dadfef818d527f48
|