Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.15-py3-none-any.whl (108.9 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.15-cp312-abi3-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.15-cp311-cp311-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.15-cp310-cp310-win_amd64.whl (4.8 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.15-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.15-py3-none-any.whl
  • Upload date:
  • Size: 108.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 ddb9ac0a55c4cf104080359a004df66fd3261e3c94ab718b200213a077f4fab7
MD5 d0b987773c6822e0b792dc03f76c5b15
BLAKE2b-256 f39f0be813f1bba663f1bdfb00de5c620347702def63c09c3e3ebff5b4b6df63

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d4b77e2a150f614b83fdf50626bd81b8e579d55eda6ebdf8322fe093ba65a206
MD5 e6edc97e74b2d5bdc37f660a2a832c37
BLAKE2b-256 eeaf37e045eebc7cb95f8e05e7afa67fa8ac8ca967aad8b3a05686a5715e55b7

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f0a25dbb5e7a284c9aef5592223c63de56c7b43fad8fe35837811f4e59e2595b
MD5 952e6880085bd3e6f8a3c1061f24be62
BLAKE2b-256 9bdc3e1cdc3ed9fb5f704555bafbc02c16431b68aa77a4a481f955f532aab85b

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ba76b0d7039d46776ada90e81517a211a52299e8d2f244becb92c89e746caf83
MD5 fe07eef6578fd6ac387f2f4755419fc6
BLAKE2b-256 3ba99c6d01bac946cfa81721f401a58d6b12b5904d32ea85af69ee3d7d2384af

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 01a0500a25bc8d41ddf9dea355450dd3f4c3cf6f4761a17fdf5c42c78f39a686
MD5 abe397591b92b4691ae63cececcdf904
BLAKE2b-256 5b0430bc36f6bb04c1045d02f4f4f33828566a3b0769d3465e1b12717170a8d1

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 08a2e3ed89c5ee1f33506522bbc1868a357022ead75b12ffe4a0ef4f5097f407
MD5 ab44947fd6ffa6848c87d210024975bb
BLAKE2b-256 8206f8120c0c3513579ce628d728e2279d09ece2ebe1fc52465cdbb223d4b3fa

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 345db7d64fc9d527b66734b1c3f203de73281731e5c7d62e5e79c4a95a2b308e
MD5 e60f111c7d82ca102f4e19e7ab674eda
BLAKE2b-256 8186f07a43d19a614e963f04a6657fb48f86097648e26f114f4146f5eb2bb5f6

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d7c45393047c32e05035110b94d1984ac5c82c6feaf8547f974b5197908198a0
MD5 bbc830655ae02cac6e04120fc2678341
BLAKE2b-256 551f1899ffd1fba18b5bafc906861ef56b76aba6d5d5406802c1f3c0013e6f60

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2e81aec55851d15c17b6a4327008ddb59ec9f926a452fb73d49f2e5d70f75120
MD5 103452bafc92c642bd9166c1bbcec44b
BLAKE2b-256 e17e74dd69731555954d894f6bdf9117918d94b689ae7dd22854bff3acfc5d08

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.15-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4e9542f78b472d5834508baef9daa4cbad8fa07d6a22d92445ab8dfc602cc53c
MD5 c9a5d8b19c79d94a861ed0e7dd7e5763
BLAKE2b-256 0965ed99ea9e5da14f216b08020da780eb25fe938afe156e199609bb26a13353

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page