Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.14-py3-none-any.whl (105.4 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.14-cp312-abi3-win_amd64.whl (3.6 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (5.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.14-cp311-cp311-win_amd64.whl (3.6 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (5.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.14-cp310-cp310-win_amd64.whl (3.6 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (5.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 105.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 53fc600197d4c153b702e589b1952f042fb000758622d20c70d05eda60a7a55a
MD5 9bc08e8bb6692e99fe10bb881ba94e7a
BLAKE2b-256 b287eadbeddc21a832c9e4aa54fbe1bf1c604cbd10e3d8aa5369e3983ee6e41c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4d1373510cc1caf88a4f14f8edc96bdfb7525cfd3eea389911b708e8224201bb
MD5 b2eac54d1dc4cc5a247feef720991414
BLAKE2b-256 87099469f985cfbf731f7b319ea2cdb3a967df446a2a108fa4107d9c45c429fa

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0814d8c47e91733df33492472e8dc5267ca0d44e6c3f3f8cb6dc9ed65eb8bfc3
MD5 4ce2d6d97248c2956bc385715a87f0c1
BLAKE2b-256 e8ffe18579372d96cd84c9f9f864fea6406f8f1a9cd31cdcec4946c378f3e9a4

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 28afcef2e277c8c6db4da69c25932c9b65fd93093b0867577cedfb435ffe0c51
MD5 43fd177bc2f1d32bda359d29e3feb358
BLAKE2b-256 8bd76de0d8aae4df0899eee2491a95e2f9b4040fbdea4925b7dfa426ffcab25e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c9e1572baabc2a8547b9453bd37dd0c101489edfacdd20f05f5b8f1c58be003a
MD5 5e8821a72dee2eb950129de6028e0679
BLAKE2b-256 23e39c8cf744acbda71f249649ebee608d64c042199fd1be3be1607ed6e443e2

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4281af9f217acddec9fc893eb94f0b58d6709e82e0ddb0105900c288cb77bba7
MD5 a4ba7046b909c3632dbaff0013dd2669
BLAKE2b-256 05552451887a4550065dda774ab4e3fa548844c4a1e91a9ab6a2c81583472976

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 48db33c62c4339c1c650db0c11588f2d89fc6d7103a25b22d2f2e87290634f9d
MD5 bfce45f88dd86ea7a9bbaba6e0bae889
BLAKE2b-256 ef8bf90af26a9ef034d9e45f30caf8e18e719b817a26b13219e7b6a05cd40cad

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 71b3d791c94ce7c8c00e3f21b1db428f7fd5ed915c5031a2a79471016947b054
MD5 ebd58b6f4760e8d55f6b8fa14fa281be
BLAKE2b-256 7eef1c7422f247863ba6219825e0fa09de1102eb61f2c670e0b4833b862cab89

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7b5b40e229d3c00e23e806f7540b97eb441e45d13341ef7e660f646a5cc3433c
MD5 44f1204082fd0ef1337a1ff76ab6188a
BLAKE2b-256 b7fc8d7b11dc2dbc30d5fd0404f6a16f550cef73e1bcc8dc7616919ebe6aa110

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.14-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cf87cda55ad076cd9ab8b18a8f6850d6f5ad4cdf7adc707aec9f02097173eeb1
MD5 bf81b01a82c8e2702adcc912edee33e5
BLAKE2b-256 e95e117a8ccc12a50ebcac3e5e440d4b6342f214d8193d53fb382ac6e89ed46a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page