Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.7-py3-none-any.whl (58.0 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.7-cp312-abi3-win_amd64.whl (592.9 kB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.7-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (680.6 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.7-cp311-cp311-win_amd64.whl (592.6 kB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (683.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.7-cp310-cp310-win_amd64.whl (592.7 kB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (683.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file comfy_kitchen-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 58.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for comfy_kitchen-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f8faa579b69d331d2f1eac09e96a95586c2a6b958a54bc19e7f1c1a77852dd36
MD5 cf2bb4e271fbcaf192a44d022c499328
BLAKE2b-256 f865d483613734d0b9753bd9bfa297ff334cb2c7766e82306099db6b259b4e2c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 047b9ac7c8c1a845a51b0de3fb05c8d007666d68a3e776e07ecb5db21f15fbdd
MD5 f308a5e4e74517c4b5e36d5920501dda
BLAKE2b-256 b56b1cea270d5014a465929375c434c2f78a35fadde5dfb6f436864e4c8f7a52

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4a168eb1fcdbb31707bb0e1226c6d44e1bd1b0a5ac1ac0a4d9c6eb7296b903ae
MD5 0c4791f098312aaaaf3ffda1c03a5e8e
BLAKE2b-256 44e19b6e7764f8dcd5cb9b9ae369e55660bf24b7f48825584521246e3bddf43e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b1b9127ecd3446bb3b1c26a81fff64f13a0b9468aa1ce4b9b97439347b143c36
MD5 a25db79099d49803f8fb6c4eaf405163
BLAKE2b-256 90dda0a0a4724ded1ae714004431098e42ca8a3d1b9a9cc15c4929e7045eb987

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 38ba8449337e0bcccae3503bf7619b49b750974603ea4e961664e400e7da6a7f
MD5 43ab7adc220df75b0fa85613088ad9da
BLAKE2b-256 cecdbc2af5ba3fdb8dee6a020f8542cffc9ab2810ff9e56f92ef057d588c5b37

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 eeaef3672b2af7f673ac8f394909167812a64af09ee25f0ebf8bdf7ada2dd9df
MD5 9976d52c0276bc4a8682682477da081f
BLAKE2b-256 70b337dee4c7365a13927b238267c5bd7aba2b9e4b7876992ba89811f8a62259

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9273d797041df33606bea982afa41b00b85b738008a462d9bb2af9a61bc69f76
MD5 46294fa8722ffa6bd82e4d95cbf3a29a
BLAKE2b-256 35271fef0dfc5b9c3d4f7b9619b1f96c5767c0b69b0a810d85793f30b75089ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page