Skip to main content

CUDNN FrontEnd python library

Project description

cuDNN FrontEnd(FE)

cuDNN FE is the modern, open-source entry point to the NVIDIA cuDNN library and high performance open-source kernels. It provides a C++ header-only library and a Python interface to access the powerful cuDNN Graph API and open-source kernels.

🚀 Embracing Open Source

We will begin open-sourcing kernels based on customer needs, with the goal to educate developers and enable them to customize as needed.

We are now shipping OSS kernels, allowing you to inspect, modify, and contribute to the core logic. Check out our latest implementations:

  • GEMM + Amax: Optimized FP8 matrix multiplication with absolute maximum calculation.
  • GEMM + SwiGLU: High-performance implementation of the SwiGLU activation fused with GEMM.
  • Grouped GEMM + GLU: Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + dGLU: Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts.
  • Grouped GEMM + SwiGLU: Legacy contiguous-only grouped GEMM SwiGLU API.
  • Grouped GEMM + dSwiGLU: Legacy contiguous-only grouped GEMM dSwiGLU API.
  • Discrete Grouped GEMM + SwiGLU: Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing.
  • Discrete Grouped GEMM + dSwiGLU: Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing.
  • Grouped GEMM + Quant: Legacy dense-only grouped GEMM quant API for MoE FC2/dFC1 workloads.
  • Grouped GEMM + Quant (Unified): Unified grouped GEMM quant API with per-row gating for MoE FC2/dFC1 workloads.
  • NSA: Native Sparse attention as described in the Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.

Key Features

  • Unified Graph API: Create reusable, persistent cudnn_frontend::graph::Graph objects to describe complex subgraphs.
  • Ease of Use: Simplified C++ and Python bindings (via pybind11) that abstract away the boilerplate of the backend API.
  • Performance: Built-in autotuning and support for the latest NVIDIA GPU architectures.

Installation

🐍 Python

The easiest way to get started is via pip:

pip install nvidia_cudnn_frontend

Requirements:

  • Python 3.8+
  • NVIDIA driver and CUDA Toolkit

⚙️ C++ (Header Only)

Since the C++ API is header-only, integration is seamless. Simply include the header in your compilation unit:

#include <cudnn_frontend.h>

Ensure your include path points to the include/ directory of this repository.

Building from Source

If you want to build the Python bindings from source or run the C++ samples:

1. Dependencies

  • python-dev (e.g., apt-get install python-dev)
  • Dependencies listed in requirements.txt (pip install -r requirements.txt)

2. Python Source Build

pip install -v git+https://github.com/NVIDIA/cudnn-frontend.git

Environment variables CUDAToolkit_ROOT and CUDNN_PATH can be used to override default paths.

3. C++ Samples Build

mkdir build && cd build
cmake -DCUDNN_PATH=/path/to/cudnn -DCUDAToolkit_ROOT=/path/to/cuda ../
cmake --build . -j16
./bin/samples

Documentation & Examples

  • Developer Guide: Official NVIDIA Documentation
  • C++ Samples: See samples/cpp for comprehensive usage examples.
  • Python Samples: See samples/python for pythonic implementations.

🤝 Contributing

We strictly welcome contributions! Whether you are fixing a bug, improving documentation, or optimizing one of our new OSS kernels, your help makes cuDNN better for everyone.

  1. Check the Contribution Guide for details.
  2. Fork the repo and create your branch.
  3. Submit a Pull Request.

Debugging

To view the execution flow and debug issues, you can enable logging via environment variables:

# Log to stdout
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=stdout

# Log to a file
export CUDNN_FRONTEND_LOG_INFO=1
export CUDNN_FRONTEND_LOG_FILE=execution_log.txt

Logging Levels:

  • CUDNN_FRONTEND_LOG_INFO=0: No logging
  • CUDNN_FRONTEND_LOG_INFO=1: Full logging with tensor dumps
  • CUDNN_FRONTEND_LOG_INFO=10: Basic logging (safe for CUDA graph capture)

Alternatively, you can control logging programmatically via cudnn_frontend::isLoggingEnabled().

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nvidia_cudnn_frontend-1.21.0-cp314-cp314-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.14Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.21.0-cp313-cp313-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.13Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.21.0-cp312-cp312-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.12Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.21.0-cp311-cp311-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.11Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.21.0-cp310-cp310-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.10Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

nvidia_cudnn_frontend-1.21.0-cp39-cp39-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.9Windows x86-64

nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 304d150ca768837eaca9f2ad8a94bb80c1f161eeeb71b696c5807d375bc27915
MD5 e5cf2106cba80e92e308af53dc647465
BLAKE2b-256 684084e86ed1f3ecc408492471e70aa71efc595e641d698df3cc57320cd0a3a1

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a2c9101c7c8845225505b446be49b6eded60c9718f9befceb00c8be34cf3f954
MD5 46989ae4e3d83723dff918559b0f1caa
BLAKE2b-256 c2c933ae2104d964ee3c9f30ec59e577f961a28293253db1946a55cceb96ef8e

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 eee3f650d8573c39ae7add45881b2d80d1a1b540fae75a9fecead3d756bc2640
MD5 ea61ce137279fdfdaa2c1dfe8138dced
BLAKE2b-256 37174ce01e6c0e50968acc610962b9f1fa397f8c6437d235edb662e61a75a6db

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2a787d4d44ecce7542fae4715630e3427ee4c1feda62069643c010b81e72aa35
MD5 5c450e1c61a053e7bcfd9acbde3ec7df
BLAKE2b-256 a63d3b505280001ac79d909ff3524236a46d16947f1ee487a049a4f257059731

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 69ad047e47580e76f3694d8e6c74c6d939ce2fff199145f3ea1e6a494c8ea84f
MD5 70550b94aaccb9d2230248713ec3dd14
BLAKE2b-256 b9f25eefffb126d0bb8b81deade2b9cba466cbabf8e2ae9422cce56834fcb3fa

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c3153affbefad49dba1e8643dad59642b54f7d2724bf7286acc5830619ada5ba
MD5 a1bdbdc7b6e9b70ec09df405a89fec53
BLAKE2b-256 8314320fbc455599073990d2b76e0658aaa9b10d4c11cd87abd591925d7565b4

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3e5f28d29cef9a57568942ff66f6c8814a6815b369b3bf850bdcdd67c9cd41cd
MD5 3235d3cd75a7a7a4469c1a445f95ec00
BLAKE2b-256 c4c843ca12a36a1a7156ecff413cc1e4050a5eb5205c1fcfc92481dc4370257f

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ae3feffb22a9c90007fc54d27642e160dab750401ac5399589dd450e8d01780f
MD5 560116cf11d7069097cc499b967f9c42
BLAKE2b-256 879d669525d19be3ea3e3e815ab0ecb21508ed0f57744a06e1e71bf9f73dda5e

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 22f8abb69124f2bc6750e7d7e8625680964cb61caab4c3aaee641d5ad295f37b
MD5 eb65b9c43fef8e7e92e9fc33ace01f8a
BLAKE2b-256 b661c471907880a77d4ad88c9a2a5b22348f7ebb767d0ac78b2efb43908bf085

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 21a1a9663e307db0da1de6c9666972cf7f36bea0dc7d214a83a1379249747352
MD5 d925f9febb45beb0608dfd59f39b3fb6
BLAKE2b-256 15218ef163747bc4b48113f528db75174581b44d33747b55c31b14c8fea16aa2

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 247a4c4d79f7c9243da01713189ee777a56142516df9d3702085fe4ace089ee2
MD5 7dbbf72e0714d3b1ef76bb5e084bef9f
BLAKE2b-256 3ad206ccc17fa799e508f576e707e645ecd9e840f7a1bf0269ae6dc0b06a800b

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 301cc20fdc90936252c986d71feacf74981bc5cb4fa703e091d5c3550f3070fa
MD5 1c76f7771388f61ebf3f5723c45e096e
BLAKE2b-256 a5f6a8f65ba45a503b9b3609ab938ba1c3d16f76409356861a02231ac28af602

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 7a464a73f3d6cb7be268739237078c2677df96623d7c053c0fc89bbdc41132cd
MD5 f0d66440334929f8a220b9720826a3c8
BLAKE2b-256 7e8ab1527e611ae3e3ef74c50ed28826fa01ab642400bfb0d7a13dcec28fcf42

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 43938ddba2ce3e6be6e28f340dcc97edbe3e61b97847ba5ecc6d5cbdb9ea0dd2
MD5 90c6b95e51540b531d04dbfd22c7b681
BLAKE2b-256 e424e90916a533fb17ade0872d0dc950cac54acf7c57ce962e3e4cf7ef669bfb

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 23dddbdc42d01fbfbfffc8a79f8ac2c1c99cc84382a39d442c9fd44d38d119c5
MD5 4859573c67998a234fb71007c6b5ff4f
BLAKE2b-256 0863f1e8a18f5f5a83270fed598a684fd28a8b591fbfdcdb7e248fde98b36fdf

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 9694e436406704d7e41ef5e8f0f5457d4d91f11e4a3e9405f50b86b0ecb0ae72
MD5 82813ed205b215d57db0c5db279268ef
BLAKE2b-256 2e3a799560f08cd14624acb35360ddf774aacd4e5205852236745e823f22e8c2

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f3782a55d18992d854869cb1cb310d0042bc2262cc744e36c19399677d621295
MD5 ebb701ff693cf76f8c1c0cc0c2cf96c8
BLAKE2b-256 385b664fb7644bfd2d8f71e230684226d296ad1dcf767affdf4195a2a8451556

See more details on using hashes here.

File details

Details for the file nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nvidia_cudnn_frontend-1.21.0-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dbd5a7a0f45c6f94fc1453f7a0f99fa6e465c011e25576cfd53ec7d024587a12
MD5 0e7bf581221f246bc89ba815b17c8057
BLAKE2b-256 6197608704e4686afdf55e1b8845c10826e3a818a96efbec0f13326a9682fcb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page