Skip to main content

NVIDIA SOL ExecBench - GPU kernel evaluation framework

Project description

SOL ExecBench

Speed-Of-Light ExecBench is a rigorous GPU kernel evaluation and benchmarking framework built to benchmark AI-generated kernel solutions written with the variety of DSLs that NVIDIA hardware supports.

Kernels are:

  • Checked for various forms of reward hacking
  • Tested against a reference solution for numerical correctness
  • Timed under reproducible conditions

Leaderboard submissions are ranked based on SOL-Score: a metric that grades custom kernel performance based on the theoretical roofline of a NVIDIA B200 GPU (obtained analytically with SOLAR).

Supported kernel languages: PyTorch, Triton, CUTLASS, cuDNN, CuTe DSL, cuTile, CUDA C++.

Prerequisites

Setup

1. Download benchmark data (one-time)

./scripts/download_data.sh

This downloads the SOL-ExecBench and FlashInfer Trace datasets into data/.

2. Build and launch the Docker container

./scripts/run_docker.sh --build

This builds the image and drops you into an interactive shell inside the container. The repo's src/, tests/, and downloaded data are mounted automatically.

Evaluating a Solution

Inside the container, use the sol-execbench CLI:

# Evaluate using a problem directory (contains definition.json + workload.jsonl)
sol-execbench <problem_dir> --solution solution.json

# Or specify files explicitly
sol-execbench --definition def.json --workload wkl.jsonl --solution sol.json

Example

# From the host — build, launch, and evaluate in one command:
./scripts/run_docker.sh --build -- \
  sol-execbench examples/cute_dsl/jamba_attn_proj \
    --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

# Or from inside the container:
sol-execbench examples/cute_dsl/jamba_attn_proj \
  --solution examples/cute_dsl/jamba_attn_proj/solution_cute_dsl.json

CLI Options

Flag Description
--compile-timeout Compilation timeout in seconds (default: 120)
--timeout Evaluation timeout in seconds (default: 600)
-o, --output Write JSONL traces to file
--json Print traces as JSON to stdout
--lock-clocks Lock GPU clocks for stable benchmarks
--keep-staging Preserve staging directory after run
-v, --verbose Show subprocess output

Running a Dataset

Use scripts/run_dataset.py to evaluate an entire dataset (or a single problem) in batch. By default it runs the definition's reference implementation as the solution unless --solution-name is specified. Saves to ./out/{subset} by default.

# Run all problems in the benchmark.
# Auto builds solution.json from a single code file
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --solution-name solution.py

# Run specific categories with multiple solution code files
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --category L1 L2 --solution-name solution.json

# Run a single problem
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark/L1/my_problem

# Limit number of problems and workloads
uv run scripts/run_dataset.py data/SOL-ExecBench/benchmark --limit 5 --max-workloads 3 -o ./results

Results (traces and a summary JSON) are written to out/run_dataset/ by default (override with -o). Problems that already passed are skipped on subsequent runs unless --rerun is specified.

Problem Format

A problem directory contains:

  • definition.json — Kernel specification: function signature, tensor shapes, dtypes, reference implementation.
  • workload.jsonl — One JSON object per line, each defining input shapes, values, and tolerance thresholds.

A solution is a separate JSON file referencing source files with the kernel implementation.

See the full schema docs:

  • Definition — Kernel specification (function signature, tensor shapes, dtypes, reference code)
  • Workload — Concrete input configurations and tolerance thresholds
  • Solution — Source files and build specs for a kernel implementation
  • Trace — Evaluation output (correctness and performance results)

Citation

@misc{lin2026solexecbench,
      title={SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits}, 
      author={Edward Lin, Sahil Modi, Siva Kumar Sastry Hari, Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou, Yuan Zhang, Jingquan Wang, Sana Damani, Dheeraj Peri, Ouye Xie, Aditya Kane, Moshe Maor, Michael Behar, Triston Cao, Rishabh Mehta, Vartika Singh, Vikram Sharma Mailthody, Terry Chen, Zihao Ye, Hanfeng Chen, Tianqi Chen, Vinod Grover, Wei Chen, Wei Liu, Eric Chung, Luis Ceze, Roger Bringmann, Cyril Zeller, Michael Lightstone, Christos Kozyrakis, Humphrey Shi},
      year={2026},
      eprint={2603.19173},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.19173}, 
}

License

Apache-2.0. See LICENSE. Contributions require DCO sign-off — see CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zsol_bench-1.0.1.1.tar.gz (273.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zsol_bench-1.0.1.1-py3-none-any.whl (73.2 kB view details)

Uploaded Python 3

File details

Details for the file zsol_bench-1.0.1.1.tar.gz.

File metadata

  • Download URL: zsol_bench-1.0.1.1.tar.gz
  • Upload date:
  • Size: 273.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for zsol_bench-1.0.1.1.tar.gz
Algorithm Hash digest
SHA256 a2f815f7ffeda0582f6a4ec0cc9d1d035f493cb80809f0f43fe41bb153821cd8
MD5 05d818072e18e71e566ae14c47978685
BLAKE2b-256 84476a844865795385a3983c449b54938e9ffb7a4ae28919cde667d63a333118

See more details on using hashes here.

File details

Details for the file zsol_bench-1.0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zsol_bench-1.0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 73.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for zsol_bench-1.0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31f0bf869d382d301f0745d5d2813b46f38f325511d9dd4b71550234722dcfbf
MD5 8b57e673ba5cb3a84589049e7ec0e25c
BLAKE2b-256 c0ad9dbb6c64a7fe5f2b5a8233a01c1a47202d8a2c3821fa986e58da3dd5f05a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page