A production-grade, multi-backend compiler for Triton kernels (ONNX & TensorRT)

These details have not been verified by PyPI

Project links

Project description

Kernel Lens

A production-grade, multi-backend compiler for Triton kernels.

Kernel Lens bridges the gap between PyTorch research and high-performance C++ production. It automatically traces PyTorch modules, intercepts custom Triton kernels, generates optimized C++ bindings, and compiles them into native ONNX Runtime and TensorRT plugins—with zero C++ boilerplate required.

📦 Installation

Install the core compiler (PyTorch & ONNX graph tracing):

pip install kernel-lens

Install with inference backends:

pip install kernel-lens[ort]   # For ONNX Runtime support
pip install kernel-lens[trt]   # For TensorRT support
pip install kernel-lens[all]   # For everything

🚀 Quickstart

Take any standard PyTorch nn.Module containing a @triton.jit kernel, and compile it for production in one line:

import torch
import kernel_lens as kl
from my_models import TritonMatmul  # Your custom PyTorch/Triton model

model = TritonMatmul().cuda()
A = torch.randn((128, 128), device='cuda')
B = torch.randn((128, 128), device='cuda')

# 1. Compile the model to native C++ backends
compiled_model = kl.compile(
    model, 
    (A, B), 
    name="my_fast_matmul", 
    backends=["onnx", "tensorrt"]
)

# 2. Execute native zero-copy inference!
trt_output = compiled_model.run((A, B), backend="tensorrt")
ort_output = compiled_model.run((A, B), backend="onnx")

✨ Comprehensive Features

1. Zero C++ Boilerplate

Kernel Lens entirely automates the generation of native C++ bindings. It reads your Triton kernel signatures and seamlessly generates robust Ort::CustomOp and nvinfer1::IPluginV2 plugins. No bash scripts, no manual nvcc flags—just Python.

2. Dynamic Grid AST Parsing

Triton utilizes dynamic grid calculations (e.g., triton.cdiv(M, BLOCK_SIZE)). Kernel Lens intercepts PyTorch's symbolic tracing, sanitizes the AST (Abstract Syntax Tree), and dynamically translates it into raw, high-performance C++ integer math for the GPU block scheduler.

3. Deep Multi-Kernel Tracing

Your custom kernels don't need to be at the top level. Kernel Lens uses PyTorch make_fx to flatten complex, nested nn.Module hierarchies. You can string together multiple different Triton kernels across various submodules, and Kernel Lens will trace the entire computational graph flawlessly.

4. Dynamic Multi-Output Support

Unlike primitive compilers that assume a single output tensor, Kernel Lens dynamically dry-runs your network to count outputs and infer exact datatypes. It easily supports kernels that return multiple tensors of varying types (e.g., a float32 matrix and an int64 indexing array).

5. Zero-Copy TensorRT VRAM Mapping

Kernel Lens bypasses CPU bottlenecks. It hooks directly into PyTorch's CUDA memory allocator, formats the memory layouts safely (enforcing .contiguous() checks), and maps the VRAM pointers directly into TensorRT's execution context for instant, zero-overhead execution.

6. Cold-Start Persistence

Don't waste time recompiling. Kernel Lens caches your compiled .so plugins and .engine files. You can load a highly optimized model directly from the cold cache in production:

# Instantly loads previously compiled C++ plugins
production_model = kl.load("my_fast_matmul") 
output = production_model.run((A, B), backend="tensorrt")

7. Fail-Fast Environment Diagnostics

Kernel Lens respects your time. Before initiating complex graph tracing, the internal diagnostic tool verifies your system environment (nvcc, g++, TensorRT headers, ONNX Runtime execution providers). If a dependency is missing, it fails instantly with actionable installation advice.

🛠️ Advanced Usage & Debugging

If you encounter silent failures or want to see exactly what C++ math is being generated and executed on the GPU, Kernel Lens includes an aggressive native C++ debugging suite.

Enable it via environment variables before running your script:

KERNEL_LENS_DEBUG=1 python my_script.py

This injects printf tripwires directly into the compiled C++ shared libraries, outputting the calculated execution grids and exact VRAM memory addresses right before cuLaunchKernel fires.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.4

Apr 28, 2026

1.0.3

Apr 27, 2026

1.0.1

Apr 27, 2026

This version

1.0.0

Apr 26, 2026

0.1.0

Apr 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kernel_lens-1.0.0.tar.gz (25.7 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kernel_lens-1.0.0-py3-none-any.whl (28.5 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file kernel_lens-1.0.0.tar.gz.

File metadata

Download URL: kernel_lens-1.0.0.tar.gz
Upload date: Apr 26, 2026
Size: 25.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for kernel_lens-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a29e78622c16d931f6befea406efb59b7f8ef510df1c57a2ca726a4f40aae4c1`
MD5	`e861b5b2d733ced12ac49f7f6826aba4`
BLAKE2b-256	`e64e52bf4d99183cf0f4010ad4c5cb34c892763b6dbad7869ffbab7abec8f18f`

See more details on using hashes here.

File details

Details for the file kernel_lens-1.0.0-py3-none-any.whl.

File metadata

Download URL: kernel_lens-1.0.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 28.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for kernel_lens-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cfb1fa7837fd180ef1c2d57a3c34e170dc3d0f9c44b1318fc334c1b5379ea3c0`
MD5	`8407e7f3af66800c1ff1a2fe5fb46ea9`
BLAKE2b-256	`29baf73ed8e1eeea4acc5eb6f2e439c3f10db735ac4e3163515433975baf3d1b`

See more details on using hashes here.

kernel-lens 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kernel Lens

📦 Installation

🚀 Quickstart

✨ Comprehensive Features

1. Zero C++ Boilerplate

2. Dynamic Grid AST Parsing

3. Deep Multi-Kernel Tracing

4. Dynamic Multi-Output Support

5. Zero-Copy TensorRT VRAM Mapping

6. Cold-Start Persistence

7. Fail-Fast Environment Diagnostics

🛠️ Advanced Usage & Debugging

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes