NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is considered deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu12-1.6.2-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bb044f32c408b8fe9020dda606dc75f1a6eff09d35a0c35832400ab6cb7233c |
|
MD5 | bc58d8eff90ab70cf27378854d8213e4 |
|
BLAKE2b-256 | 58657210cebebe46dfc2faeae6441cdf1bdfe73bb968340f486097a4ebb544f2 |
Hashes for cutensor_cu12-1.6.2-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b05d797bf674d46e3bec0e134985dc5acbe98b3b0a0c249d3cb02132d350a46c |
|
MD5 | 2059a0ec1205ed7e997f562a67df13ba |
|
BLAKE2b-256 | 89c6283800aa459fa8eda985a7ba258965752c6cc3c69d51ec7c2a8f03e80d57 |