Skip to main content

CUDNN FrontEnd python library

Project description

cuDNN FrontEnd(FE)

cuDNN FE is the modern, open-source entry point to the NVIDIA cuDNN library and high performance open-source kernels. It provides a C++ header-only library and a Python interface to access the powerful cuDNN Graph API and open-source kernels.

🚀 Latest news:

We will begin open-sourcing kernels based on customer needs, with the goal to educate developers and enable them to customize as needed.

We are now shipping OSS kernels, allowing you to inspect, modify, and contribute to the core logic. Check out our latest implementations:

  • GEMM + Amax: Optimized FP8 matrix multiplication with absolute maximum calculation.
  • GEMM + SwiGLU: High-performance implementation of the SwiGLU activation fused with GEMM.
  • Grouped GEMM + GLU: Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + dGLU: Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + SwiGLU: SwiGLU activation fused with Grouped GEMM.
  • Grouped GEMM + dSwiglu: dSwiglu activation fused with Grouped GEMM.
  • Discrete Grouped GEMM + SwiGLU: Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
  • Discrete Grouped GEMM + dSwiGLU: Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing.
  • Grouped GEMM + Quant: Legacy dense-only grouped GEMM quant API for MoE FC2/dFC1 workloads.
  • Grouped GEMM + Quant (Unified): Unified grouped GEMM quant API with per-row gating for MoE FC2/dFC1 workloads.
  • NSA: Native Sparse attention as described in the Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.
  • SDPA Backward: SM100, D=256: SDPA Backward pass for D=256 on SM100.
  • cudnn SDPA Fprop: Open sourcing the Hopper and Blackwell fprop kernels with stats.
  • Fused RMSNorm + SiLU: Implementation of a fused kernel of RMS normalization followed by SiLU (Swish) activation.
  • SDPA PyTorch Op: PyTorch custom operator for cuDNN-accelerated Scaled Dot-Product Attention with autograd and torch.compile support.

🔥🔥🔥 SOTA Attention Kernels from cudnn backend

Llama 3.1 style Forward and Bprop with causal masking

Llama 3.1 SDPA Benchmark on GB300 (only cuDNN)

Deepseek v3 style Forward and Bprop with causal masking

DSv3 SDPA Benchmark on GB300 (only cuDNN)

Key Features

  • Unified Graph API: Create reusable, persistent cudnn_frontend::graph::Graph objects to describe complex subgraphs.
  • Ease of Use: Simplified C++ and Python bindings (via pybind11) that abstract away the boilerplate of the backend API.
  • Performance: Built-in autotuning and support for the latest NVIDIA GPU architectures.

Installation

🐍 Python

The easiest way to get started is via pip:

pip install nvidia_cudnn_frontend

Requirements:

  • Python 3.8+
  • NVIDIA driver and CUDA Toolkit

⚙️ C++ (Header Only)

Since the C++ API is header-only, integration is seamless. Simply include the header in your compilation unit:

#include <cudnn_frontend.h>

Ensure your include path points to the include/ directory of this repository.

Building from Source

If you want to build the Python bindings from source or run the C++ samples:

1. Dependencies

  • python-dev (e.g., apt-get install python-dev)
  • Dependencies listed in requirements.txt (pip install -r requirements.txt)

2. Python Source Build

pip install -v git+https://github.com/NVIDIA/cudnn-frontend.git

Environment variables CUDAToolkit_ROOT and CUDNN_PATH can be used to override default paths.

3. C++ Samples Build

mkdir build && cd build
cmake -DCUDNN_PATH=/path/to/cudnn -DCUDAToolkit_ROOT=/path/to/cuda ../
cmake --build . -j16
./bin/samples

Documentation & Examples

  • Developer Guide: Official NVIDIA Documentation
  • C++ Samples: See samples/cpp for comprehensive usage examples.
  • Python Samples: See samples/python for pythonic implementations.

🤝 Contributing

We strictly welcome contributions! Whether you are fixing a bug, improving documentation, or optimizing one of our new OSS kernels, your help makes cuDNN better for everyone.

  1. Check the Contribution Guide for details.
  2. Fork the repo and create your branch.
  3. Submit a Pull Request.

Debugging

To view the execution flow and debug issues, you can enable logging via environment variables:

# Log to stdout
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=stdout

# Log to a file
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=execution_log.txt

Logging Levels:

  • CUDNN_FRONTEND_LOG_INFO=0: No logging
  • CUDNN_FRONTEND_LOG_INFO=1: Full logging with tensor dumps
  • CUDNN_FRONTEND_LOG_INFO=10: Basic logging (safe for CUDA graph capture)

Alternatively, you can control logging programmatically via cudnn_frontend::isLoggingEnabled().

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cudnn_frontend-1.22.0-cp314-cp314-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.14Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.0-cp313-cp313-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.13Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.0-cp312-cp312-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.12Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.0-cp311-cp311-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.11Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.0-cp310-cp310-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.10Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.22.0-cp39-cp39-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.9Windows x86-64

nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 5994400a7f76a1be5e327a9ac1a4a635ee734d2ac8a5875e52481c52cf2b0922
MD5 85e0f3df858e847125ff7cd4848b15d5
BLAKE2b-256 a5fdbdec32a32b44f52b60a03f43e8619552ea0eb90a61de06632a054bf17d6a

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 90f30b0d6563d050ca1972efa594a31d5affe5c3eeb467542e715d7ee73e3b5b
MD5 02d8ec242325000e687ffb59c7ecc7d5
BLAKE2b-256 89bd3464d181ec2d94085cab98fd5ea4d312478aa6cb16ff38994a9188ac9f05

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0f650058bda46a6542dfc3d021803021e7932e1cd6bb78cf46e81fa219717b5e
MD5 9243e234a5f42c8f45e71702a862b41c
BLAKE2b-256 27ec8c9b53a9174cca2d0062cbd8cb7c31403a38cb4c79984a9c554830cac5e9

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4906a38954725e35bc8431874f4d9db60d50e0d9dbc40ecaf8e5f40df545350b
MD5 d40552e410bf6fa42c562b42d77ec510
BLAKE2b-256 ba1d3a15b719817ca6241e5f3a7a38608af21a3259e550a5dee5520e29adac00

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d02c4b4aae3e243ddb08ad4eb939988bcf7b1aefe25f5d400f6858c7276a6631
MD5 5fcfc8ceb7fa6360523ae9eaace53762
BLAKE2b-256 9b5baf9da5a455064380e68a441b9cfa1f1212dd6363bd02b5aa696d319bd211

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c9bdf48cf989b2a77f8b52623fc31c078362fd34389207d11cdb0b5624a7b311
MD5 cc09776866f3d296070d04b74014a8f1
BLAKE2b-256 c79343541b581207024824cb740f429bf882aaf3bde3633bd4099393dd9c0c16

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 81fde93d9b86ad631e17da1e2c103c4a7a541ec7abcb7f9a121cbd018c8eff26
MD5 9632b4bed998b9407f7d4b582e8ace24
BLAKE2b-256 6147522e84a37eedb1f680e74df449d39fe6f8641779523313d1a8522d449766

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 98ffa05699d71795372f112fa2361c13be716fa3fda911c1e809903163ea5d11
MD5 a0968025d8599cb849c8edde06272981
BLAKE2b-256 0e4695b7779a2f71dfccce1783cc5ac210dda0124b93f8bf66cf62ed3d9ce0a5

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bc9c12891d5427ef49b72b26df2b7889d623086d77c9e33b021c2de417d3e4dc
MD5 adec9feb168d5d2d23a0fe131383a13e
BLAKE2b-256 7ef167681e585abd98f968298c771b72830ce984a90fd0d787098d2ea2ba55c7

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 49f817377a19e10e4aafa5797cd68315739dfdb2fc6a67dd1052b64c805d24ec
MD5 f59e0d0a073bb2f61d87bee48543ff67
BLAKE2b-256 d54fde06583ec21313f31d8b83bc2164e88fc22f5b48d8eb5cb45490fcf7c262

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bb50bd2758c6d47c6210451c5c1932ed16e7563d7629228f4cc97edc0e01d0c5
MD5 5bd61541200328a0978eeb5d977128cc
BLAKE2b-256 522762fc6e2cddff7d6396be3685342ceec1c12fe2ee50e6f31d270887ecb5ad

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cdff54c945fbabf9da06fd64ded60cf1ec94d580474f5746786c0effd759fedc
MD5 1fbd4b9f44647e1f5ccbccd6f644d0b9
BLAKE2b-256 bfffe4955b6fdff929ddf04a1252facae6201b308e001c91c690e96f65c4e90a

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 22748b41049d02c029719467924ea20d928517dd8f35e204a390f97407298eb2
MD5 8dcb7dfb7a893085375041e8699e3d05
BLAKE2b-256 2b56755412cf4ce5ad95bcb00be3144c8e1fa07cbbae073f31a7b75ddec96ca0

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 62bf9c8569caf4d9518dae0755507ad36a4e311726aa015fde104c38a1630f76
MD5 eac82d6faf6a7f439b86249bd1314be8
BLAKE2b-256 8bb4976996f1ab721bbcae4b7379652949ddcd41803817d4b65b9bd0d726aa60

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dbd3100ae212dd1f4691f8c096fe3aded46491f9a6cb258bfb802d07ca1a88fc
MD5 c0f21d2ceeb212c42926b3191adebf87
BLAKE2b-256 407d28ab9cb9119fc6a3a383d943448ab310fe787daf784869b167dc7269969f

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 007f569d38be5c921c42165080eb02e8e687a08a7e4a25dac1ee02e36dbae66e
MD5 89e61b4b7b08f69baf3a7c907af0d1a8
BLAKE2b-256 27eca96a76b58da8fd92d8d7d1b6d51efbe0a5a4a74d379b28dce3b0cfcd74b2

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 596319a2503ee39b21688d33ac12966d2149d407573726aa2065f50b53b05a5d
MD5 35575fe8d1bd1f4adb95ba718e5c3d2e
BLAKE2b-256 47a83390950f63a5ef3f9d25fd3113a4d625183b8ba1e08ceeda8d52b60bdb30

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.22.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 57e3ea7ecc6a69b5ed548e50f006180236c964ff63e4ac1a66aab6621cacb430
MD5 0bc73b8dd5d1581d9a67d47a0da5fce8
BLAKE2b-256 d13daba4d8b5adf234890bec897d048574964e640e6883b22659db628b86ca5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page