NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu12-2.0.0-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b54f64885686f53f7c86458e30fc9a212efba67a7dcfc0647829f16c62056cb8 |
|
MD5 | 1e00506fa886ad66244c67e7543de432 |
|
BLAKE2b-256 | 01fca015f378a2d4d9c90afd93f9f42b803f3fe50343fd2f4f914766b9cd7f99 |
Hashes for cutensor_cu12-2.0.0-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae0d3fa6700ba1f2a95d02bfe2ba9959a704b59606d5ceccbaf44b097759019a |
|
MD5 | c4ba0d727e7c9805a0c56ff1ad57275c |
|
BLAKE2b-256 | aa4555593f4ea755c259a58a7bc7a05f70d153e5a341b2da0219406be7930612 |
Hashes for cutensor_cu12-2.0.0-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f485a80fd088a3c70b09c6de936bb4c7581a6dbf6e6f2de4fbca59ea03fb7c96 |
|
MD5 | b46b1d5bf21a9d3bd22e2ac7e560c064 |
|
BLAKE2b-256 | 85179c9c31aa143b84555649b2bc0c153f524531d2110bb0fa8b82c38f752ddc |