Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.11-py3-none-any.whl (102.8 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.11-cp312-abi3-win_amd64.whl (2.7 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.11-cp311-cp311-win_amd64.whl (2.7 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.11-cp310-cp310-win_amd64.whl (2.7 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 102.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 814831577758b1bcc19ce63cc82d5aa4bbcc67a439370494e82e221c4a1f2eb2
MD5 f90958cd23d95a19567d2a8d31983008
BLAKE2b-256 14ef75225f122cd70f0200c5b41665d2a2d544f3bac4255bc117bf7088d717da

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 db3937df259da92e33ba4742d65406716713a30256c166c3dfc784ba4b21381c
MD5 c696b75536cc12b1cc611aa13ebf66df
BLAKE2b-256 28c418f0e660802c096d2403eb23a1f832ec6504f51789508ba488b1f33f9e03

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 919e057343e381688c35066893d59de50bab62d1ffc07333b370171f3dd5b175
MD5 0abd7e2d882e096e21d4522f20e17efc
BLAKE2b-256 b620d90e44b3e7ac6bc2c515fc05818389a98880ee855c31730473c2e3faf9ee

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4cef379419e90a17ff76fd9110f78fc06316d31a33762258765abfb7ce78c260
MD5 8612a1fa32a3ec97eea7df1056d08091
BLAKE2b-256 a6a92c61388bbc43b586a1c4553bf7cc7d8eddc1db7168a86b220dcddea3d852

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 35b51d66c165154a3ae81fdc445fb1c8634e1260b7997065a299ae18fc05234c
MD5 e940f9473ea9a91b35fb96e1067aa32e
BLAKE2b-256 ee4d98d0d349d63f1db920d6cde8ec6309edf473d881ca65ebf174c98eaadc95

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e0586774037430df198080b53e0dc38fbaec8cfcbbd6ee4c3de129813bd5e837
MD5 7fe5f7063b6f19b3e00e5dcc56f17352
BLAKE2b-256 71806f343a3b56426d48238f91becde37971715398f088bc3313ba1c286def99

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 707356476c418803dac411ce61e28332df8fcb7ea7b2a498d506fde29cf1bde8
MD5 886497e74a9ff14ef4cbe0aad4fbf1a5
BLAKE2b-256 15be62453ae5b6d5838c70776fa09827d4679f8e68f5ffbdb24b995be73e5361

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4a97db3ad3457a2a1448cd27c4da1d1d27c7b5511bf634848ca6a48c345581ac
MD5 b664d9cfd2fa05246e4c9e542a216057
BLAKE2b-256 9858c70c6b9499533eed3b32195eb42f298b5e6fcf07969f714dad5b4fdb2090

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2eb3c7c254d54ef085c9ca7a20a70fe6f562b17e79d856b7345e776621e57e99
MD5 483d9c57389c9924cdcaa27cab631318
BLAKE2b-256 3e87cf1c8056ec4b21895a329d95b9cec64c0b9a6c462f128765a12f3ccc4a23

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.11-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7abda51c47e07752168dd227227b9ef658276959b64a893a8189fc0fa45aa40b
MD5 770b523a4f2a84e93067a1eed13f0455
BLAKE2b-256 07c34b8bb9518dbf6955b4aa994833112536d512acc54e781b61a909fd2dc54f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page