Skip to main content

An evaluation framework for Triton kernels

Project description

Technical report arXiv

Improved TritonBench evaluation framework

Dependancy installation

  • You may install requirements as pip install -r requirements.txt

Installation

Please install running the following command from the root folder:

  • pip install -e .

Running evaluation

Before running evaluations you must run the setup to record ground truth performance data for your GPU.

  • geak-eval setup -ds tbg
  • geak-eval setup -ds rocm

You can run evaluations in the following two ways:

  1. Command line run:
    • geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds tbg for Tritonbench-G-v1
    • geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds rocm for ROCm
  2. From python script: the following is a bare minimum example, for a detail example please see geak-eval/run.py.
    • from geak-eval.evaluators.interface import get_evaluators
    • evaluator = get_evaluators["tbg"]() # for TritonBenchG eval
    • evaluator = get_evaluators["rocm"]() # for ROCm eval
    • call_status, exec_status, stdout, stderr = evaluator(generated_code, log_root=PATH_TO_LOG, file_name="kernel.py", atol=1e-5, rtol=1e-2) # run evaluations

Issues with existing TritonBench evaluation framework

  1. 1_exec_acc.py file in TritonBench did not accurately compare the outputs of two Triton files.
  2. The execution was purely done using subprocess call for both generated and ground truth files.
  3. The seed consistancy is violated.
  4. The outputs of the two Triton runs are compared using stdout string comparison, which is not always correct.
  5. Around ground truth 150 files do not print(result_gold) line, hence the eval framework is essentially comapring the two null strings.
  6. Some of the ground truth files (e.g. context_attn_bloom.py) does not even have result_gold = test_*() line at the end. So the call accuracy run using this file 0_call_acc.py just blindly assumes that the call was success.
  7. 7 kernel files (originally provided) run into memory access faults, we have fixed them.

We have fixed these issues as follows:

  1. Use torch.allclose to compare two runs (ground truth and generated).
  2. Fix ground truth files to include result_gold = test_*().
  3. Ensure consistent seed across files.

We have also integrated performance measurement into the framework. Kernel evaluation flow is as follows:

  1. Check if the kernel is callable: run the test function of the kernel.
  2. If the kernel is callable then check if the kernel matches ground truthe by comparing outputs of the generated kernel on know tests.
  3. If the generated kernel is correct: run the performance evaluation.

Help/support/contribute:

Please raise github issue or PR for any issues/help or contributions!

You can contribute in the following ways:

  1. Add new kernels for evaluations:
    • Add the dataset of new kernels under geak-eval/data.
    • Add the path of this new dataset in geak-eval.constants.
    • Add an evaluator interface for this new dataset in geak-eval.evaluators.interface.
    • Add an evaluator to be run by the interface in geak-eval.evaluators. The evaluator is a function that only runs python call and does not run if imported as a module. The evaluator (e.g. TB_correctness.py) is run by its interface (e.g. interface.TestAllCloseEvaluatorTBG).
  2. You can add new metrics for evaluator to work with in geak-eval.metrics.
  3. You can add new performance eval metrics for your (or existing) dataset under geak-eval.perf.

Updates

  • [2025-07-16] Added autotune compatible ROCm kernels and naive softmax, use -tp argument with path to this folder as below:
    • geak-eval eval -f PATH_TO_EVAL_FOLDER -o RESULT_NAME -ds rocm -tp geak-eval/data/ROCm/data/ROCm_v1_autotune
    • naive_softmax.py kernel from rocm blog is added to this repo.
    • Use -c argument to directly run evaluations on python triton code file(s)/folder instead of json-based parsing.

Credits:

Our repo has found the following repos as helpful:

  1. TritonBench
  2. ROCm AITER
  3. ROCm Triton

Citation

If you find this work useful in your research or applications, please consider citing:

@misc{wang2025geakintroducingtritonkernel,
      title={Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks}, 
      author={Jianghui Wang and Vinay Joshi and Saptarshi Majumder and Xu Chao and Bin Ding and Ziqiong Liu and Pratik Prabhanjan Brahma and Dong Li and Zicheng Liu and Emad Barsoum},
      year={2025},
      eprint={2507.23194},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.23194}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geak_eval-0.1.3.tar.gz (396.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geak_eval-0.1.3-py3-none-any.whl (786.3 kB view details)

Uploaded Python 3

File details

Details for the file geak_eval-0.1.3.tar.gz.

File metadata

  • Download URL: geak_eval-0.1.3.tar.gz
  • Upload date:
  • Size: 396.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for geak_eval-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4f1d03b664d35cf579767a3f30127b5be31239c59eb2715110ed1e4a3e2b646a
MD5 f5c4001b8dd9cfdc964a81e3ee8c49cd
BLAKE2b-256 7be5b432087bb8cf7dad67a3d1d6e3dd52da8a95bf87707a48c38d99aed796e5

See more details on using hashes here.

File details

Details for the file geak_eval-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: geak_eval-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 786.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for geak_eval-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b295f7ff1b9f2cf80040055e6e6a0c7cede3dbfb1a1dd6ed0cc452e91b66b912
MD5 9eedddb6947effb174f2cfde75c6717a
BLAKE2b-256 9f59055c3c034e585a10814105da98e59fdbf2181b7f75705a74d16f7c1a8aba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page