Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.16-py3-none-any.whl (108.7 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.16-cp312-abi3-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.16-cp311-cp311-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.16-cp310-cp310-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.16-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.16-py3-none-any.whl
  • Upload date:
  • Size: 108.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.16-py3-none-any.whl
Algorithm Hash digest
SHA256 d924f61e1448c88a5d02e6cbc3c602cb86e5f5b4bb744a191a46b469e8fc932b
MD5 7969873a9109cb93f4d30f0cfa79e084
BLAKE2b-256 d52b0037a3cb781ca2696a6ece58d3cee852b4e29420ec4dc0b1f238967e0f8e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5dd0134501d3331b7ca6afcb9bc93a6f1705f5f4fca19c0cb17fe3aa6e754da8
MD5 fd41975b3acbe329d17ba389d8f0cb69
BLAKE2b-256 456234828a920be6be5b642ba2e01205f3abb0c79632146b874ca3a69c6e4efb

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9dff56495da5a6a4cb4f81bd0ac99dd96e39b3f19f61a6918526e98386e89010
MD5 6eb35a111aba225bae7d510a7a4e0b62
BLAKE2b-256 3aa31b128a2de98282f034c6b041b7f37d16d89d4d89fbed4de90de54d0cf582

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 59153eeef9602d73cf471082cf29ee89ab94878e32281482c7a560d38aabfe12
MD5 779ab644a77f029fa691356a40b12117
BLAKE2b-256 84843ff372dae6e65938fb7ff3fa05241e584fa164f53c4ff120eda211886acd

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 651b649c3144bd9714a5cc553d9e7d4071c163db357cee0bbd8430d814f6c85a
MD5 16d4f0d8681a2d4f7e311abe23e96a36
BLAKE2b-256 fcb5458a82128d41beea91105947a3485b01e65e89e71ca1c9fe32205f7d2c8e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ef592069eb7c9ccbfc457e40f3a502ced3c911f3a946a82e377eb3182b321e52
MD5 3396d950788a0aa5aebebababb9d69e7
BLAKE2b-256 9df6866a20df29943975456e35dbbce11a983ea54e20cc195b536d3ebbabe7f8

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 804a7ef3cdf7158e2e60f85db696633259da9d822f22490fb04eb52202b5da00
MD5 4a14210283d5ebe4cc4ae777d8a31bf9
BLAKE2b-256 2b540500c31fcd40097aef8a9f4390d4ec0d86d3819b40a66256dcb14f8c93f5

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a844345c074b60e2fff2be78e9ac6921dd1ec4ea7c074e15eb8aad59c08a355b
MD5 cb458de986b64d7fe6bee0119106e72e
BLAKE2b-256 759d244a25246c329f410470f66d4237b23db9089497c400494202d852d2e7c8

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 842aa58d22998d32c479b1d5ab7e917eceee7f9add9e7645d22267c432856da6
MD5 07451c8573636a28e6f8a12808c05567
BLAKE2b-256 891ee97f50d3ce784ada40760213695095aa055cd6ed74cf3fa169dac876293e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.16-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 81011b027dfcaa4733c307587d9f911963d5e70aa22376900f6d8cd77f94bb9f
MD5 9e9ce42b2705bd5544d09f07f7fb1ab8
BLAKE2b-256 cba93e994cec37520934569f7ab7cbdc3d1cf870b39299e4c09c975932fa3c07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page