Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.9-py3-none-any.whl (88.3 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.9-cp312-abi3-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.9-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.9-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 88.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 33677ba013478949833fe94da843b787ad4ea63056fc0ce6467cc9c35d25eab2
MD5 cbef1a6f38a1cf9899f3a5df0defaef1
BLAKE2b-256 0471413ff9d4414fdd2d1f3f7b6aedf2af10d721bd9e6382cef6518b2a7550f8

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d696b98c3304597da6e25facf36ed880189afde8b5e0546b706c6f49a3102138
MD5 47e7d1d87e1560ea4cf6b72a6555e9a1
BLAKE2b-256 f90dbd701d7356cdff8034e7d400c4b7e8c87f7150dda6a14a472404b143e556

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2331a28ded9b81ad9beae2693fc0d648948df6540afb587eb1ba36a6e0bcccd2
MD5 2db512fc15db38af317bae08d6a6e47c
BLAKE2b-256 faa35f29b007ddbf01ca52f86976f7baec2942f4eb9cdd8b2f8cea099bbe01c8

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c7cbf35eb7bed88ee7adcb789e32cdfb518f21f2a47aecfc35cae48df78f31db
MD5 76fc8a90bae8fe1341274588d13636cb
BLAKE2b-256 7b8d6772553ea034676726b2725d00a9a9824e39e6b796e01049d6639582260e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fa1fc72577098e781d3f374d13eb1a1fe51a4afff1ff746623c68979bc74778f
MD5 6e53e13656649a401af6a7087ad6ebdb
BLAKE2b-256 b315fb2a4e5bbc632a095bc2f44a0634573b5c05ee01abc40693feb7f44e5163

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 deb9c040ff03356a3a5629e69e80686de9af6a636b6053cf1c2752bf5aed71e2
MD5 424d246911b4fb8714ea2c7466b57f45
BLAKE2b-256 8224d78d1085a88f3acc9ae0d1e24f6103e376e079be401f3e9ca8a78698f64c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b6a20536a04c4d07bad00de4c9d3a77941757cdae85d45f97c7bc82fce6391e7
MD5 8740fa69ca6a35c5a4fc89336c0ff28e
BLAKE2b-256 e390b698742130a18a1a2b239fd1d5ac1024e61df442f4d51e660c5a452efd3d

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1984e8b6e1a07de590e582315a09309055ed06bf9d3ddec70635e914525c99d5
MD5 32f0608954ce8937a8e9fc8fc2788cd4
BLAKE2b-256 4fef9f3f54ef1c223072d7b11054681631aeaa84910054daedd483eb4e15c7a7

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 27a593eaaa7517ffe2c397c7659e54ac8b90d422d442ef44924be89a35f313f7
MD5 e71b5609bcb6792c4fe5038c3cee4be5
BLAKE2b-256 d9ac848e77029c840dfa008223e1e7e80c90d3b30ee80dfe9f19d9ec4feed56a

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.9-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 50c88b357303a76bf03fe30c9b6449f62019b0db6be7e52865c46a3520da3a62
MD5 5ac039c8e7cf549898a7ce4edf9cb5a2
BLAKE2b-256 6485b756612e92d18ac5242ee991f35d959e2d961e4741c030da1b036f3b0402

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page