Skip to main content

Fast Kernel Library for ComfyUI with multiple compute backends

Project description

Comfy Kitchen

Fast kernel library for Diffusion inference with multiple compute backends.

Backend Capabilities Matrix

Function eager cuda triton
quantize_per_tensor_fp8
dequantize_per_tensor_fp8
quantize_nvfp4
dequantize_nvfp4
scaled_mm_nvfp4
quantize_mxfp8
dequantize_mxfp8
scaled_mm_mxfp8
apply_rope
apply_rope1

Quantized Tensors

The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.

Layout Format HW Requirement Description
TensorCoreFP8Layout FP8 E4M3 SM ≥ 8.9 (Ada) Per-tensor scaling, 1:1 element mapping
TensorCoreNVFP4Layout NVFP4 E2M1 SM ≥ 10.0 (Blackwell) Block quantization with 16-element blocks
TensorCoreMXFP8Layout MXFP8 E4M3 SM ≥ 10.0 (Blackwell) Block quantization with 32-element blocks, E8M0 scales
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout

# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)

# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)

# Dequantize back to float
dq = qt.dequantize()

Installation

From PyPI

# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen

# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]

Package Variants

  • CUDA wheels: Linux x86_64 and Windows x64
  • Pure Python wheel: Any platform, eager and triton backends only

Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).

From Source

# Standard installation with CUDA support
pip install .

# Development installation
pip install -e ".[dev]"

# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v

Build Options

These options require using setup.py directly (not pip install):

Option Command Description Default
--no-cuda python setup.py bdist_wheel --no-cuda Build CPU-only wheel (py3-none-any) Enabled (build with CUDA)
--cuda-archs=... python setup.py build_ext --cuda-archs="80;89" CUDA architectures to build for 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows)
--debug-build python setup.py build_ext --debug-build Build in debug mode with symbols Disabled (Release)
--lineinfo python setup.py build_ext --lineinfo Enable NVCC line info for profiling Disabled
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda

# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel

# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel

Requirements

  • Python: ≥3.10
  • PyTorch: ≥2.5.0
  • CUDA Runtime (for CUDA wheels): ≥13.0
    • Pre-built wheels require NVIDIA Driver r580+
    • Building from source requires CUDA Toolkit ≥12.8 and CUDA_HOME environment variable
  • nanobind: ≥2.0.0 (for building from source)
  • CMake: ≥3.18 (for building from source)

Quick Start

import comfy_kitchen as ck
import torch

# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)

# Check which backends are available
print(ck.list_backends())

# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")

# Temporarily use a different backend
with ck.use_backend("triton"):
    result = ck.quantize_per_tensor_fp8(x, scale)

Backend System

The library supports multiple backends:

  • eager: Pure PyTorch implementation
  • cuda: Custom CUDA C kernels (CUDA only)
  • triton: Triton JIT-compiled kernels

Automatic Backend Selection

When you call a function, the registry selects the best backend by checking constraints in priority order (cudatritoneager):

# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)

# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)

Constraint System

Each backend declares constraints for its functions:

Constraint Description
Device Which device types are supported
Dtype Allowed input/output dtypes per parameter
Shape Shape requirements (e.g., 2D tensors, dimensions divisible by 16)
Compute Capability Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4)

The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.

# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)

Testing

Run the test suite with pytest:

# Run all tests
pytest

# Run specific test file
pytest tests/test_backends.py

# Run with verbose output
pytest -v

# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_kitchen-0.2.13-py3-none-any.whl (104.2 kB view details)

Uploaded Python 3

comfy_kitchen-0.2.13-cp312-abi3-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.12+Windows x86-64

comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.13-cp311-cp311-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.11Windows x86-64

comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

comfy_kitchen-0.2.13-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.10Windows x86-64

comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file comfy_kitchen-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: comfy_kitchen-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 104.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_kitchen-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 7408904ee5ef726916fe046f8c3ed6e6647eee2b2b43f9556377d8f3ac3fd4cc
MD5 ed587cd8aea2a2f3ddd100597833645b
BLAKE2b-256 502a9936fc59e4955272807afcb5ade20b5b9b84c6865959104c962e82b8303b

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c3d7bba4cf15af691c8eaad41087cdfbceca4b2a350246130723226325ec67bb
MD5 f93cdf59a56be6020262ec45b781bd72
BLAKE2b-256 77b2f1b1cb702fa98226e45fd0a739e94deeea4290bee04f192b69f5ca69cf95

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4dea2136e29654087df3442fc1aa4368f9357bc37d306dd9a5f39220af4fe1df
MD5 b86be283b36ab44e265262a11ae9a190
BLAKE2b-256 2a6080eef594f5016840fb7d7800cc62ffe5e4f0af0a2728c4a4d2683aa8406a

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp312-abi3-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f18b3c9c2b99556a6dd7a434cde686a50066f41f931c08a2e1734a27d325285a
MD5 a33897d238a76f0dfd0ec49d4dffe3c0
BLAKE2b-256 9a665264136085c4bb1550aeec5d1dafa59cf55e90ce03755b5062e58c09d1a8

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 41db3492c2352f55341c4659deb71066c9d5c918ddb8be90ef00acbfabe4036d
MD5 d4f3fc4a5b726dbab4b81ed45b411f09
BLAKE2b-256 f90856ca011f5a33678c4369d0460f6f3bc3ee8c9f30a85c69cf386f646810b0

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f79035344e2c7052d8821da0c69e03f546f679f2223c7217621c45e0aa8aa564
MD5 e12fbc1e6d3f5678676f6ce8cd1b6eb0
BLAKE2b-256 727dd4c46a15d4fbab7e2f31ab5e11d357b436f4000297e698b0d4547dd8a3d7

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6c66950fbfec642cc0d8888030390bef677fb9194e537e02400ecb5744ccf315
MD5 d999db22046c533443e61cf0d9490772
BLAKE2b-256 d3fc4b60e7f4ab6ede7cc76d546b54b511d830e3944c3582aa5f082289e46682

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b1872af3b152d3e2ba02a7563b11a881a279a8c753abd834288bf1554c670981
MD5 61d804d8cd0a4ad57fa9f5287915bccb
BLAKE2b-256 bd250897c999e229b25e98c04b73f321e76e9451c174d4287e6e760037655c75

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 42eeff37186848c24f32de5ba26413ffb84fa22e710af3f6dcaecb44f6c05014
MD5 8bb4ce8a8bf9ebb34f58b7b6dc314454
BLAKE2b-256 7a003f08dfcebd75f239cb9a354e79eff7e8033631200849aca450d320af81e6

See more details on using hashes here.

File details

Details for the file comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for comfy_kitchen-0.2.13-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9018c5eca48cc7f7ae205314eeaf447364e1c247f5305aa4d6912551ba17cd04
MD5 2de843382e912973f9aa1de82c3a39cc
BLAKE2b-256 b283e6e9c8bdc005f550700008053bddd799c8269fd47a69796f62e2d6640411

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page