Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.8-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.8-cp312-abi3-win_amd64.whl (678.0 kB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (794.6 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (791.5 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.8-cp311-cp311-win_amd64.whl (677.8 kB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (797.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (795.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.8-cp310-cp310-win_amd64.whl (678.2 kB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (798.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (795.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for comfy_kitchen-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 0b322992cf1681ebe3d413ac7950f39e5cc425505c13cef8d08527ce82fe5e32
MD5 c1d740104fa234ff56ff990fc9b63a56
BLAKE2b-256 46b064c3e0656db822cb4f7dbc928d6a13cf2cad861cc178e6386114634e355b

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 20b94aa87e6d42181ad928560487a0e2a692dfcc9fce40d4e31f8fc57753865c
MD5 7ced1556046702ea217b03c0857c02ca
BLAKE2b-256 845c14cb27952313dca93e3e8a58223fd6297acd701df3d9727c007edbcb3559

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 570b2eb74c77ce9356b0140ec190253dcbedeb1e92bec5092444abef9cd24ebf
MD5 ab47cacb985812aa6e0f0e68f6adc88a
BLAKE2b-256 9aafe87ebae2b1bba35291663b0d6eea6748b340d7c21802224cfb37f3a48dda

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c2eb7d03921a70e21fd9bb43d557e09b20bec81479cb85c2b58e8cdfb2cacc9
MD5 283d6594cf73b50c71236b5e2619211e
BLAKE2b-256 94f11206bc530adef9193ba78964d687b6f0381a853abee04cf6fa483bd082a0

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 33bcc074a500dd68a3124231346581761845b0de76f99cfff0f37ae4f149d1f9
MD5 bf3a885440a6b54e45a4c4569284c01a
BLAKE2b-256 7feb4b39e87b775309e0f0de08d1afef73de971e7199d1064d099645a8cecab4

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7c6e418e1a4410a9b566a8053882bd90964c289fac8e8702b1e9dcd3c5ea5778
MD5 0f28b537406a4caa22b4737565cceaee
BLAKE2b-256 911b2db8f3b7065a2bdbb6156054b081b63d376def732c05db94a6bc0b164638

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b1249d098e72ba895758c7723881e5603150d7fad39a83b369a887dac2352a43
MD5 74313522663bcce6754dd9b48bf5da4e
BLAKE2b-256 0868e4eacd81eaee1306f70fffb003afb68ae9be2b24524acee0ef948df4669c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2569e2b64b9925722867c44c9a62a169e6c3cbdb533de0f2eaebca5afd83971b
MD5 0f6149271438892cadb428e69379bc1a
BLAKE2b-256 36f9a6e9441c231075bbbff916947fd498e1bcfec4be9a6401cb163fcdd8cdcd

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 997dd4c5b5b854164fb76909a9cbc629cb7c471084e466c85d343842a4160a97
MD5 cdd1a3607c1b16cc76016a727fde4d60
BLAKE2b-256 b8cc8a3802300591cc4f8925e3b3052f9d8a0e86371e03df8179db74aa3a0c17

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.8-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a5c782f5eef84212e16f38d8c33e667f2119d8f8e13ee4eeac624ac60649b1d6
MD5 fd0547b9286706d69bbe9b0ebf5757aa
BLAKE2b-256 d7341b2a2c140ef7eb94ac0e88fc4705ea886f38718b1a280e5ceffd151e3d34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page