NVIDIA cuTENSOR
Project description
cuTENSOR is a high-performance CUDA library for tensor primitives.
Key Features
Extensive mixed-precision support:
FP64 inputs with FP32 compute.
FP32 inputs with FP16, BF16, or TF32 compute.
Complex-times-real operations.
Conjugate (without transpose) support.
Support for up to 64-dimensional tensors.
Arbitrary data layouts.
Trivially serializable data structures.
Main computational routines:
Direct (i.e., transpose-free) tensor contractions.
Support just-in-time compilation of dedicated kernels.
Tensor reductions (including partial reductions).
Element-wise tensor operations:
Support for various activation functions.
Support for padding of the output tensor
Arbitrary tensor permutations.
Conversion between different data types.
Documentation
Please refer to https://docs.nvidia.com/cuda/cutensor/index.html for the cuTENSOR documentation.
Installation
The cuTENSOR wheel can be installed as follows:
pip install cutensor-cuXX
where XX is the CUDA major version (currently CUDA 11 & 12 are supported). The package cutensor (without the -cuXX suffix) is deprecated. If you have cutensor installed, please remove it prior to installing cutensor-cuXX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for cutensor_cu12-2.0.2-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2ae37dc9e4a1643dee9318ffdbd212097660e69826328953830cead567fd543 |
|
MD5 | f413c9a16db6dc129c90c44beeb47ee4 |
|
BLAKE2b-256 | 08a13fb72bd0593dc4e451d5e6f81c43562b38622a24d68642ff9bda8df35ac0 |
Hashes for cutensor_cu12-2.0.2-py3-none-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18c96a4f1e8a559eec626527f5928d5f5b575f6c2b9c45e87309a025ae682334 |
|
MD5 | 1cc1e67fe05b55aae6f604f5518efc44 |
|
BLAKE2b-256 | edd661fc3511bc9e4cdb423b69964e3d344090b4093cbf9d3c8cc469ef4642d0 |
Hashes for cutensor_cu12-2.0.2-py3-none-manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1db559bdfe4345ac19ee66ab7ee49a54e98b1529fc96de812ade3dbc0a90ef47 |
|
MD5 | 6fb2971ae31c6dbb75a284618de6355f |
|
BLAKE2b-256 | f751786c275bc675e3f5d8d207c378652bfbd4c4103174ce857f1a04ff194211 |