Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.12-py3-none-any.whl (103.9 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.12-cp312-abi3-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.12-cp311-cp311-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.12-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 103.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 8acdd85aec2ad9ca9f45f675582f6cf4b8b0204635f768dfbe8c298728a80519
MD5 33d6a2694b0b8988ace6b0f86c299537
BLAKE2b-256 4dbef9937f8436e14c453dc9705cf0b8bf0ed69ac1e63e04d80ed4a2ff50eb32

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5d8062afe94176d65409e8645fb426df888f3cf3410938052c5567c6890308fc
MD5 ca8d47cc4c61bda66b8c16a0763f45a8
BLAKE2b-256 284e7059e4b00365083e609b2aac15fdbf35b305e5cc21f56771f6065f230a74

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fab9f605788a375a7e5be6547feec2feedae70c91828fb07de0a5ac8b0ea27c8
MD5 4dd59f0d7465a35495d878b10b2d015d
BLAKE2b-256 e54378387a6068c4f056ee6ca182e9d28f06e8ca56cea3070fc62e88fdea2038

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7776be7d68cb623ed6351e6cec53e5201334264684f6a621bdb23ae192d9ae1c
MD5 b02da3202c7e522a01e03572907030f6
BLAKE2b-256 78c37e75a8bb9b52471ea67f5897f1e010aeec5825041c94ed2d44a7ed8d03ce

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 29ceae8dd6e791cf06c10ec45c6ff36708fd6462e24448bc603fb5557d3eaefe
MD5 037e1b66fc8ea0940d540225c74db370
BLAKE2b-256 06ca1836e5a2ef6af7b52b69eeda5a7b9b0099afa49f39bdbbcb93d4abdcc369

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6a879b8a08ff978e64ce8f51eee6c8368a7b651e7146f49e3d5937298ff8984c
MD5 ea32b9371efda167a63e9cc5bef4dc7a
BLAKE2b-256 5ec4441142598f0c238efcdac5f9ba3dee8b2d6a25d75052b362c65517b658a1

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 19feed0020f457b665bd1cc4f77b6409e078a053f84b06a9037afbad7a5f4842
MD5 5489c88e1e4a204fd471be50bbd0c51b
BLAKE2b-256 70154d347c366271adaf04bc56d6e49d186b7f9106bdda3d4be18bc469a2c891

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 72560469c718df02d13097bcfca65d6365da347cdea849cf952a45a73df2f355
MD5 acbbe5a32c000a0d230bc6e38adb4f39
BLAKE2b-256 7833e3190b9a3e74165b2600941f463ddf5ff5eb06af049bffeece4f40c33902

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 403df04a70e9e975cf1895ee0c1d82d0a91f9b5dc8057f11bc30a0d68cd44238
MD5 809e2e73c319ddf32ed62bcb8bf21fdd
BLAKE2b-256 77c0ab2e6758a044a4b700498e1acec2e2b6c6ae46a78f2c9e586bc6fb4eeaf0

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.12-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 38ad7715e8a37873ba4e240b3b8b5e132b38f210e6699da1adc65b82dc44adab
MD5 2e9072881f8f76e6f1be2cb6fbf557bc
BLAKE2b-256 8d040911eee4abb76ec06d4464c20c756a309c1951ad1f1abf3532ec1ff4165f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page