Skip to main content

CUDNN FrontEnd python library

Project description

cuDNN FrontEnd(FE)

cuDNN FE is the modern, open-source entry point to the NVIDIA cuDNN library and high performance open-source kernels. It provides a C++ header-only library and a Python interface to access the powerful cuDNN Graph API and open-source kernels.

🚀 Latest news:

We will begin open-sourcing kernels based on customer needs, with the goal to educate developers and enable them to customize as needed.

We are now shipping OSS kernels, allowing you to inspect, modify, and contribute to the core logic. Check out our latest implementations:

  • GEMM + Amax: Optimized FP8 matrix multiplication with absolute maximum calculation.
  • GEMM + SwiGLU: High-performance implementation of the SwiGLU activation fused with GEMM.
  • Grouped GEMM + GLU: Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + dGLU: Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + SwiGLU: SwiGLU activation fused with Grouped GEMM.
  • Grouped GEMM + dSwiglu: dSwiglu activation fused with Grouped GEMM.
  • Discrete Grouped GEMM + SwiGLU: Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
  • Discrete Grouped GEMM + dSwiGLU: Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing.
  • Grouped GEMM + Quant: Legacy dense-only grouped GEMM quant API for MoE FC2/dFC1 workloads.
  • Grouped GEMM + Quant (Unified): Unified grouped GEMM quant API with per-row gating for MoE FC2/dFC1 workloads.
  • Grouped GEMM + Wgrad: Unified grouped GEMM weight-gradient API supporting dense and discrete output layouts for MoE workloads.
  • NSA: Native Sparse attention as described in the Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
  • SDPA Backward: SM100, D=256: SDPA Backward pass for D=256 on SM100.
  • cudnn SDPA Fprop: Open sourcing the Hopper and Blackwell fprop kernels with stats.
  • Fused RMSNorm + SiLU: Implementation of a fused kernel of RMS normalization followed by SiLU (Swish) activation.
  • SDPA PyTorch Op: PyTorch custom operator for cuDNN-accelerated Scaled Dot-Product Attention with autograd and torch.compile support.

🔥🔥🔥 SOTA Attention Kernels from cudnn backend

Llama 3.1 style Forward and Bprop with causal masking (GB300)

Llama 3.1 SDPA Benchmark on GB300 (only cuDNN)

Deepseek v3 style Forward and Bprop with causal masking (GB300)

DSv3 SDPA Benchmark on GB300 (only cuDNN)

Key Features

  • Unified Graph API: Create reusable, persistent cudnn_frontend::graph::Graph objects to describe complex subgraphs.
  • Ease of Use: Simplified C++ and Python bindings (via pybind11) that abstract away the boilerplate of the backend API.
  • Performance: Built-in autotuning and support for the latest NVIDIA GPU architectures.

Installation

🐍 Python

The easiest way to get started is via pip:

pip install nvidia-cudnn-frontend

Requirements:

  • Python 3.8+
  • NVIDIA driver and CUDA Toolkit

⚙️ C++ (Header Only)

Since the C++ API is header-only, integration is seamless. Simply include the header in your compilation unit:

#include <cudnn_frontend.h>

Ensure your include path points to the include/ directory of this repository.

Building from Source

If you want to build the Python bindings from source or run the C++ samples:

1. Dependencies

  • python-dev (e.g., apt-get install python-dev)
  • Dependencies listed in requirements.txt (pip install -r requirements.txt)

2. Python Source Build

pip install -v git+https://github.com/NVIDIA/cudnn-frontend.git

Environment variables CUDAToolkit_ROOT and CUDNN_PATH can be used to override default paths.

3. C++ Samples Build

mkdir build && cd build
cmake -DCUDNN_PATH=/path/to/cudnn -DCUDAToolkit_ROOT=/path/to/cuda ../
cmake --build . -j16
./bin/samples

Documentation & Examples

  • Developer Guide: Official NVIDIA Documentation
  • C++ Samples: See samples/cpp for comprehensive usage examples.
  • Python Samples: See samples/python for pythonic implementations.

🤝 Contributing

We strictly welcome contributions! Whether you are fixing a bug, improving documentation, or optimizing one of our new OSS kernels, your help makes cuDNN better for everyone.

  1. Check the Contribution Guide for details.
  2. Fork the repo and create your branch.
  3. Submit a Pull Request.

Debugging

To view the execution flow and debug issues, you can enable logging via environment variables:

# Log to stdout
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=stdout

# Log to a file
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=execution_log.txt

Logging Levels:

  • CUDNN_FRONTEND_LOG_INFO=0: No logging
  • CUDNN_FRONTEND_LOG_INFO=1: Full logging with tensor dumps
  • CUDNN_FRONTEND_LOG_INFO=10: Basic logging (safe for CUDA graph capture)

Alternatively, you can control logging programmatically via cudnn_frontend::isLoggingEnabled().

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cudnn_frontend-1.22.1-cp314-cp314-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.14Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.1-cp313-cp313-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.13Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.1-cp312-cp312-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.12Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.1-cp311-cp311-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.11Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.1-cp310-cp310-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.10Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.1-cp39-cp39-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.9Windows x86-64

nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 7a3c3e60b7be3777323426bf7334755ea99c87ffcf4c92bc7ba36c3248393f39
MD5 8cbcf6b844bdec5cada1b15fe205f5bc
BLAKE2b-256 36c774e38e48e11b1fd18e934edaa2e45bffc9af349d819f56283c24f576ed26

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fb83a3c0419e8258abebf4dbc44a68ad02bc1d63c932479b9644525beecea6b0
MD5 b2daf907163561d135f979c27fdde488
BLAKE2b-256 3defdea590a9e1b7bed616274a14ec688a3555266f8b01c73d9f6ad47ca136de

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 aecf48a08520002a92d8be8a7191cf8c674a87373823678f54a25305bb35e841
MD5 fadd3170aa6aaa3d473caf163bb7d883
BLAKE2b-256 6b5b951432f82d0226cba869c600dbbf892af9eb5e867b9d40839d0e6c6c3a9c

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7ea7887facf23d5363159073b0080cc09185e73be16ae797831d89f09b96b0f4
MD5 52ed2e3b8b6dac6a2a5c6fa2c86acfbf
BLAKE2b-256 548ce9da7bbdf197397d13bb418027951e6181d0bb74c70c648fd97376bc2ed7

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b5295f8018cd92119968d948d25b0d2d834afd552627b47450759880dfe32110
MD5 b1364eede9568f0f61f83743b5197e90
BLAKE2b-256 2b04b7b66e3a0a7b036aca0f9704b335e663609359d0e3bdd7097f6d5ccdb40a

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1bc0a0ec8004998a56f222cef618243bbee779930cdf3fe1f4a7604b2b412388
MD5 96e9cf381151e416feccf5e48bf2c86e
BLAKE2b-256 29d3d698b020ced27b75f1e29862f0bc26759da96fc743570a094632c0dd14a9

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2da1c277f008ee64273a48a5cb8d07efbb6d6774fdc08bd889476cce93b2f69a
MD5 d45b587c6118d90d2ecc5006fcb19d46
BLAKE2b-256 4f42af975c8937a4c331b1215a0b2bdd2a742d792c6f777f919fd70480d63762

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 933275df405053001888875ee75d2138b20dc4e8bf4057461b1c74ca68b0e270
MD5 d88efb713e61a1dcf2266d8d63a36199
BLAKE2b-256 34b735c87c334d553bd45809ec957b53f3d7dd13c5a407e853c9eea29fcc5b3c

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f64fb4e0a45b7a8bb126f91a71d8afc03facf14b82dade51744ca48cf20d2974
MD5 23762b92e17b0b93ec461611611e0e3b
BLAKE2b-256 1a3f523fb08d9b7be15242ade6e2a641900d05c0e9cfffab8260de37a04ac0d2

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5e0fb81a29200e40ab39eb505921d5702ab97ac39079f28169ba5a3e6249ee0c
MD5 09f0c79675cde5f5c48bb48e528e157f
BLAKE2b-256 9e9045ffa9aa735fb9679fb62978f36128e6600c6c51652931d9554b3bc1114d

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9e91a5d3264a53dff379801cf7fd79a97bedca2712295fe099b778ce36d64a54
MD5 bf954b9bc3fe9028b6b7aaf2746844ba
BLAKE2b-256 5a7d81d887eeae9674204612bc7874315ba2859bae0ce1f22ee0b8b8cfb9984e

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 66a01e367958088f49f738448e7fed71055476d81dc04897e091e9ee25987c64
MD5 268caac66510d7ff6cfe127f4ee7fd2e
BLAKE2b-256 01c028e3ebc360ae354756910542d329b9a9f4f7cb818570e689fb3ac8af6889

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 0420df3d2132f8a8d09d77d0c860ae58779395ff17405e73e3d5e926f431243d
MD5 0a7a40d807c6347540bbb4bd066f569c
BLAKE2b-256 5be1b25e7b47d1c37a3016ff554916271674e30d8e59a5a9f63a611c3858f8e5

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ce89182d291605f9807aabd98914988a72eaa4b0e6396cc25a1f1f053e411fe1
MD5 35576426e5fbbe3bbdde3dadec47341d
BLAKE2b-256 3509d7fbbbe2b1fd6ac536dabb6a140ed2b422206787b2eec04650ae691c8f0b

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 271ac2e6a089cf90d4d6048465b1aa4fef9a8377154d135630b66b1f0ff01f43
MD5 e5e737cc2024e9903d2fbeee42fc864b
BLAKE2b-256 dadb91a6bcacd669e90b601f99ec9ea4099a03c907dedb513b7e96711e50203b

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 954eda3377bee13df5ecf85b9dc1de9bf9adf3d7ae2203c6b5cbfb935f1684d8
MD5 99b772c8eed69803a40d5712e49d8325
BLAKE2b-256 afbeaf79dd75949ab8f086bb46dc57a8b74122d7c836397b7b982a7cd051654e

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d381148284c3176e914db80769732f208fea0142a757f473af74c918f6ddadc8
MD5 8a7b3e32ba3c1f5432d3c8a295ca21dd
BLAKE2b-256 31e73d6938b86b2ffb767ea59a6592147448a3c5f06cb754bbbf263e723214e4

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 de2188a1edbcbb0cbc7bd84c5d940dccf25a1c736bd4c532866bf5c958618a5c
MD5 ed774602bf6020490efdc6ac07324b9b
BLAKE2b-256 ce12af9af02d4839e5af5066caf54e90804ed9f48fbfd33c57c81677c739ee76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page