An evaluation framework for Triton kernels
Project description
Technical report 
Improved TritonBench evaluation framework
Dependancy installation
- You may install requirements as
pip install -r requirements.txt
Installation
Please install running the following command from the root folder:
pip install -e .
Running evaluation
Before running evaluations you must run the setup to record ground truth performance data for your GPU.
geak-eval setup -ds tbggeak-eval setup -ds rocm
You can run evaluations in the following two ways:
- Command line run:
geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds tbgfor Tritonbench-G-v1geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds rocmfor ROCm
- From python script: the following is a bare minimum example, for a detail example please see
geak-eval/run.py.from geak-eval.evaluators.interface import get_evaluatorsevaluator = get_evaluators["tbg"]() # for TritonBenchG evalevaluator = get_evaluators["rocm"]() # for ROCm evalcall_status, exec_status, stdout, stderr = evaluator(generated_code, log_root=PATH_TO_LOG, file_name="kernel.py", atol=1e-5, rtol=1e-2) # run evaluations
Issues with existing TritonBench evaluation framework
1_exec_acc.pyfile in TritonBench did not accurately compare the outputs of two Triton files.- The execution was purely done using subprocess call for both generated and ground truth files.
- The seed consistancy is violated.
- The outputs of the two Triton runs are compared using stdout string comparison, which is not always correct.
- Around ground truth 150 files do not
print(result_gold)line, hence the eval framework is essentially comapring the two null strings. - Some of the ground truth files (e.g.
context_attn_bloom.py) does not even haveresult_gold = test_*()line at the end. So the call accuracy run using this file0_call_acc.pyjust blindly assumes that the call was success. - 7 kernel files (originally provided) run into
memory access faults, we have fixed them.
We have fixed these issues as follows:
- Use
torch.allcloseto compare two runs (ground truth and generated). - Fix ground truth files to include
result_gold = test_*(). - Ensure consistent seed across files.
We have also integrated performance measurement into the framework. Kernel evaluation flow is as follows:
- Check if the kernel is callable: run the test function of the kernel.
- If the kernel is callable then check if the kernel matches ground truthe by comparing outputs of the generated kernel on know tests.
- If the generated kernel is correct: run the performance evaluation.
Help/support/contribute:
Please raise github issue or PR for any issues/help or contributions!
You can contribute in the following ways:
- Add new kernels for evaluations:
- Add the dataset of new kernels under
geak-eval/data. - Add the path of this new dataset in
geak-eval.constants. - Add an evaluator interface for this new dataset in
geak-eval.evaluators.interface. - Add an evaluator to be run by the interface in
geak-eval.evaluators. The evaluator is a function that only runs python call and does not run if imported as a module. Theevaluator(e.g.TB_correctness.py) is run by itsinterface(e.g.interface.TestAllCloseEvaluatorTBG).
- Add the dataset of new kernels under
- You can add new metrics for evaluator to work with in
geak-eval.metrics. - You can add new performance eval metrics for your (or existing) dataset under
geak-eval.perf.
Updates
- [2025-07-16] Added autotune compatible ROCm kernels and naive softmax, use
-tpargument with path to this folder as below:geak-eval eval -f PATH_TO_EVAL_FOLDER -o RESULT_NAME -ds rocm -tp geak-eval/data/ROCm/data/ROCm_v1_autotunenaive_softmax.pykernel from rocm blog is added to this repo.- Use
-cargument to directly run evaluations on python triton code file(s)/folder instead of json-based parsing.
Credits:
Our repo has found the following repos as helpful:
Citation
If you find this work useful in your research or applications, please consider citing:
@misc{wang2025geakintroducingtritonkernel,
title={Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks},
author={Jianghui Wang and Vinay Joshi and Saptarshi Majumder and Xu Chao and Bin Ding and Ziqiong Liu and Pratik Prabhanjan Brahma and Dong Li and Zicheng Liu and Emad Barsoum},
year={2025},
eprint={2507.23194},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.23194},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geak_eval-0.1.5.tar.gz.
File metadata
- Download URL: geak_eval-0.1.5.tar.gz
- Upload date:
- Size: 499.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b57bc38b67ef0bb7796f3f5ac0dc4e9bca947c5e25a189e833d3179517468e1
|
|
| MD5 |
a632c6e6052242fb0fe5fc044d58ba70
|
|
| BLAKE2b-256 |
7ea70df57fb957625fbd30cf0f22474f75967fbe98ea08837dd1d70aebaa2824
|
File details
Details for the file geak_eval-0.1.5-py3-none-any.whl.
File metadata
- Download URL: geak_eval-0.1.5-py3-none-any.whl
- Upload date:
- Size: 963.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6c105efd172ca0d982895a852de6f1a68b9f01660d86a52815d759e4cc994c7
|
|
| MD5 |
30d57b4be5071473792c31600e18c833
|
|
| BLAKE2b-256 |
9b690faa399a9e5e3e9bc814b5c00abefd5831610e95ae20ca5fc4b3f073ef51
|