Fast Kernel Library for ComfyUI with multiple compute backends
Project description
Comfy Kitchen
Fast kernel library for Diffusion inference with multiple compute backends.
Backend Capabilities Matrix
| Function | eager | cuda | triton |
|---|---|---|---|
quantize_per_tensor_fp8 |
✓ | ✓ | ✓ |
dequantize_per_tensor_fp8 |
✓ | ✓ | ✓ |
quantize_nvfp4 |
✓ | ✓ | ✓ |
dequantize_nvfp4 |
✓ | ✓ | |
scaled_mm_nvfp4 |
✓ | ✓ | |
apply_rope |
✓ | ✓ | ✓ |
apply_rope1 |
✓ | ✓ | ✓ |
Quantized Tensors
The library provides QuantizedTensor, a torch.Tensor subclass that transparently intercepts PyTorch operations and dispatches them to optimized quantized kernels when available.
| Layout | Format | HW Requirement | Description |
|---|---|---|---|
TensorCoreFP8Layout |
FP8 E4M3 | SM ≥ 8.9 (Ada) | Per-tensor scaling, 1:1 element mapping |
TensorCoreNVFP4Layout |
NVFP4 E2M1 | SM ≥ 10.0 (Blackwell) | Block quantization with 16-element blocks |
from comfy_kitchen.tensor import QuantizedTensor, TensorCoreFP8Layout, TensorCoreNVFP4Layout
# Quantize a tensor
x = torch.randn(128, 256, device="cuda", dtype=torch.bfloat16)
qt = QuantizedTensor.from_float(x, TensorCoreFP8Layout)
# Operations dispatch to optimized kernels automatically
output = torch.nn.functional.linear(qt, weight_qt)
# Dequantize back to float
dq = qt.dequantize()
Installation
From PyPI
# Install default (Linux/Windows/MacOS)
pip install comfy-kitchen
# Install with CUBLAS for NVFP4 (+Blackwell)
pip install comfy-kitchen[cublas]
Package Variants
- CUDA wheels: Linux x86_64 and Windows x64
- Pure Python wheel: Any platform, eager and triton backends only
Wheels are built for Python 3.10, 3.11, and 3.12+ (using Stable ABI for 3.12+).
From Source
# Standard installation with CUDA support
pip install .
# Development installation
pip install -e ".[dev]"
# For faster rebuilds during development (skip build isolation)
pip install -e . --no-build-isolation -v
Build Options
These options require using setup.py directly (not pip install):
| Option | Command | Description | Default |
|---|---|---|---|
--no-cuda |
python setup.py bdist_wheel --no-cuda |
Build CPU-only wheel (py3-none-any) |
Enabled (build with CUDA) |
--cuda-archs=... |
python setup.py build_ext --cuda-archs="80;89" |
CUDA architectures to build for | 75-virtual;80;89;90a;100f;120f (Linux), 75-virtual;80;89;120f (Windows) |
--debug-build |
python setup.py build_ext --debug-build |
Build in debug mode with symbols | Disabled (Release) |
--lineinfo |
python setup.py build_ext --lineinfo |
Enable NVCC line info for profiling | Disabled |
# Build CPU-only wheel (pure Python, no CUDA required)
python setup.py bdist_wheel --no-cuda
# Build with custom CUDA architectures
python setup.py build_ext --cuda-archs="80;89" bdist_wheel
# Debug build with line info for profiling
python setup.py build_ext --debug-build --lineinfo bdist_wheel
Requirements
- Python: ≥3.10
- PyTorch: ≥2.5.0
- CUDA Runtime (for CUDA wheels): ≥13.0
- Pre-built wheels require NVIDIA Driver r580+
- Building from source requires CUDA Toolkit ≥12.8 and
CUDA_HOMEenvironment variable
- nanobind: ≥2.0.0 (for building from source)
- CMake: ≥3.18 (for building from source)
Quick Start
import comfy_kitchen as ck
import torch
# Automatic backend selection (triton -> cuda -> eager)
x = torch.randn(100, 100, device="cuda")
scale = torch.tensor([1.0], device="cuda")
result = ck.quantize_per_tensor_fp8(x, scale)
# Check which backends are available
print(ck.list_backends())
# Force a specific backend
result = ck.quantize_per_tensor_fp8(x, scale, backend="eager")
# Temporarily use a different backend
with ck.use_backend("triton"):
result = ck.quantize_per_tensor_fp8(x, scale)
Backend System
The library supports multiple backends:
- eager: Pure PyTorch implementation
- cuda: Custom CUDA C kernels (CUDA only)
- triton: Triton JIT-compiled kernels
Automatic Backend Selection
When you call a function, the registry selects the best backend by checking constraints in priority order (cuda → triton → eager):
# Backend is selected automatically based on input constraints
result = ck.quantize_per_tensor_fp8(x, scale)
# On CPU tensors → falls back to eager (only backend supporting CPU)
# On CUDA tensors → uses cuda or triton (higher priority)
Constraint System
Each backend declares constraints for its functions:
| Constraint | Description |
|---|---|
| Device | Which device types are supported |
| Dtype | Allowed input/output dtypes per parameter |
| Shape | Shape requirements (e.g., 2D tensors, dimensions divisible by 16) |
| Compute Capability | Minimum GPU architecture (e.g., SM 8.0 for FP8, SM 10.0 for NVFP4) |
The registry validates inputs against these constraints before calling the backend—no try/except fallback patterns. If no backend can handle the inputs, a NoCapableBackendError is raised with details.
# Debug logging to see backend selection
import logging
logging.getLogger("comfy_kitchen.dispatch").setLevel(logging.DEBUG)
Testing
Run the test suite with pytest:
# Run all tests
pytest
# Run specific test file
pytest tests/test_backends.py
# Run with verbose output
pytest -v
# Run specific test
pytest tests/test_backends.py::TestBackendSystem::test_list_backends
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file comfy_kitchen-0.2.7-py3-none-any.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-py3-none-any.whl
- Upload date:
- Size: 58.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8faa579b69d331d2f1eac09e96a95586c2a6b958a54bc19e7f1c1a77852dd36
|
|
| MD5 |
cf2bb4e271fbcaf192a44d022c499328
|
|
| BLAKE2b-256 |
f865d483613734d0b9753bd9bfa297ff334cb2c7766e82306099db6b259b4e2c
|
File details
Details for the file comfy_kitchen-0.2.7-cp312-abi3-win_amd64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp312-abi3-win_amd64.whl
- Upload date:
- Size: 592.9 kB
- Tags: CPython 3.12+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
047b9ac7c8c1a845a51b0de3fb05c8d007666d68a3e776e07ecb5db21f15fbdd
|
|
| MD5 |
f308a5e4e74517c4b5e36d5920501dda
|
|
| BLAKE2b-256 |
b56b1cea270d5014a465929375c434c2f78a35fadde5dfb6f436864e4c8f7a52
|
File details
Details for the file comfy_kitchen-0.2.7-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp312-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 680.6 kB
- Tags: CPython 3.12+, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a168eb1fcdbb31707bb0e1226c6d44e1bd1b0a5ac1ac0a4d9c6eb7296b903ae
|
|
| MD5 |
0c4791f098312aaaaf3ffda1c03a5e8e
|
|
| BLAKE2b-256 |
44e19b6e7764f8dcd5cb9b9ae369e55660bf24b7f48825584521246e3bddf43e
|
File details
Details for the file comfy_kitchen-0.2.7-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 592.6 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1b9127ecd3446bb3b1c26a81fff64f13a0b9468aa1ce4b9b97439347b143c36
|
|
| MD5 |
a25db79099d49803f8fb6c4eaf405163
|
|
| BLAKE2b-256 |
90dda0a0a4724ded1ae714004431098e42ca8a3d1b9a9cc15c4929e7045eb987
|
File details
Details for the file comfy_kitchen-0.2.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 683.1 kB
- Tags: CPython 3.11, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38ba8449337e0bcccae3503bf7619b49b750974603ea4e961664e400e7da6a7f
|
|
| MD5 |
43ab7adc220df75b0fa85613088ad9da
|
|
| BLAKE2b-256 |
cecdbc2af5ba3fdb8dee6a020f8542cffc9ab2810ff9e56f92ef057d588c5b37
|
File details
Details for the file comfy_kitchen-0.2.7-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 592.7 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeaef3672b2af7f673ac8f394909167812a64af09ee25f0ebf8bdf7ada2dd9df
|
|
| MD5 |
9976d52c0276bc4a8682682477da081f
|
|
| BLAKE2b-256 |
70b337dee4c7365a13927b238267c5bd7aba2b9e4b7876992ba89811f8a62259
|
File details
Details for the file comfy_kitchen-0.2.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: comfy_kitchen-0.2.7-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 683.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9273d797041df33606bea982afa41b00b85b738008a462d9bb2af9a61bc69f76
|
|
| MD5 |
46294fa8722ffa6bd82e4d95cbf3a29a
|
|
| BLAKE2b-256 |
35271fef0dfc5b9c3d4f7b9619b1f96c5767c0b69b0a810d85793f30b75089ce
|