Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.5-py3-none-any.whl (58.0 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.5-cp312-abi3-win_amd64.whl (592.9 kB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.5-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (680.6 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.5-cp311-cp311-win_amd64.whl (592.6 kB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (683.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.5-cp310-cp310-win_amd64.whl (592.7 kB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (683.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file comfy_kitchen-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 58.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for comfy_kitchen-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 19b7f4fc43887b2ec3609dddf4e04bd094ceda260cc398906ae41821b61cf13d
MD5 2703c7aa584ba573f6af902f2d80c046
BLAKE2b-256 d48fe36ac9fe87e184f656af9385cc84afce4ac55f85e50de114001959ffc7ee

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7b049c9cb990ab96b2c7cb6b54504c157f29330abd7ea538ef38329fe378e70c
MD5 313cfed0b5f64aae180b4f911ad45e95
BLAKE2b-256 c08680edf597ff6eae0375c51eb3de7d3d1420062b36898d5e293c11ac290f2a

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 053cd1647dea70482da04e73b33290dabccbcd77e8e562388e7239d4ea0cae23
MD5 53e99da3d69f3a510f81f9325738780f
BLAKE2b-256 e95f998be1174bf5af7cd2ad2d0c4f7d3f79664892a09b51c149417a8fb652c4

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 94db348c4729828c18f96e4addd6d992f19bd0b42712f3a4d474e23a0db7cc29
MD5 5f3465f138d944e31b339b7e0fce441e
BLAKE2b-256 7204adb43370177d39da0690ff075b16cb7f525dfa355d03f68f259538264ea3

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9f951d5ae967a1f5a28239294f8b52251161a442b65c19b3bd61c7c7d2f42f9d
MD5 88b3cbbc623c9ffb007857b957408183
BLAKE2b-256 27eee7a5ffe31ad6681bb5914df83ec8d99823ba383c7d35c3c56c5077a0d275

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5227649a82b9e999dc397babccb47d52567f654f2d500dc27d2de85c93691841
MD5 e05c944fa0562df0a9a234b4fb50e15f
BLAKE2b-256 e259e1de485ea1078e371ecbf483b026d1f659f365aaed5a4aba96a5afac5d5a

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.5-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dc849675eeca31fd697813d327a18a0ef2ddc549bb3a8cbaf5e73c150f67091b
MD5 4d5855c0c07b671f568d0739fe55970a
BLAKE2b-256 fe80cdd52629eb579a11c8bf4142d778d1e91d46aa4310819d0e55fec5d9dbdf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page