Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.10-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.10-cp312-abi3-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.10-cp311-cp311-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.10-cp310-cp310-win_amd64.whl (2.5 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.10-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.10-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 c242afd18d120e28fc949c423fa28cbb22cb4d70d627d8cc7cdf6bad54dd272c
MD5 0bade56ca4dd2f62a6ecc16da2d93d1c
BLAKE2b-256 40a945869a10ead662992bd35374536e056b7ec019c6851e11e23228dc675031

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 fe6b1789dc7a673c0f9091d693c6a1013a693c988e3dfb68dc429b4bd6d81050
MD5 672cf4c4f1f9f69926010b238c02bcab
BLAKE2b-256 006b75f744abcacb17ef24d61e96fee6147d2288aaa4a165cf87ad543548a37d

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 54a64bf4dddf05ad69f5eef73d72caa328a331b798a41997559a5158e616d87f
MD5 fe5fbe34958931a023f3751fec62b38b
BLAKE2b-256 c2b94d4efa9e5f5ec4296cfce426dbd211642685141d9a52f6b5f037fcef0e6e

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b945da25740a3f51e3c1c1292dc1b2f37350d7f37b1e1c73d988f33acc0aa249
MD5 72348bdd19c421f5536e0e07219e4cc7
BLAKE2b-256 fb4d7416b070a0957aeb4551068306484610789c1af8bcbf2fe140a03b702e88

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8e1068ca046d0ec4229c7650e18a16c29003896365e62b3dbb678173d04ad7e4
MD5 60632631cf2876b7134d0f27afbdb6f8
BLAKE2b-256 986431aae4e402809cd749ed88ca15b6d24c0ace7c7b38b0c2ddb059d61d6f54

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ae984f24ec3459639212a2e7e2c0fcce6db6b21d75d5b2f8262283e0fbf91873
MD5 9276555baffe47792032adda575bc7dd
BLAKE2b-256 8ac869d47eb7fe85e17992fd2a4cbc335ce13180ab01ac6d539d58b5bae95e57

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3fb70c9e86992e8cfd651bb03752107cba879fb768075dd56ae041170c137d2a
MD5 22c3e009e20cc7712e0c1a1aebe28620
BLAKE2b-256 c90533a3b196f082560801fd23a536e111a5c6764b32be20b1a8d342f441802f

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ba86f04e1af1883deb80a04730050a5b6076491565b37bf4d3590635eaa5a882
MD5 a99499c62ffb41d303dd9b4cb9fa23c9
BLAKE2b-256 bdb74ab792402b98a49b8dda8ecdc63422a41269deec17f674cc0c0a846fc46c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4a0caad4a0f33ff8536591f3ee7f01ec975d29c636058fb3cae6de0e2570bff5
MD5 c2f3fe27a4f3fe5a6a9330437eb8e090
BLAKE2b-256 726f0d5d975d3c5ff45f2c150217ccd619628a52a8cea769296d9776d7c2840c

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.10-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 78df7cb4110c61bd85e73a25a6a368a985fea7425dbe8b54f6ede46a91954d28
MD5 adff47494a0cbe2dd1334c61ea904d88
BLAKE2b-256 6c7e22ce4f39c1703f33127cd9176a926789f6a4b03513aa4a6a70c5767c9f0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page