Add your description here
Project description
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
GSO (Global Software Optimization) is a benchmark for evaluating language models' capabilities in developing high-performance software. We present 100+ challenging optimization tasks across 10 codebases spanning diverse domains and programming languages. Each task provides a codebase and performance test as a precise specification, with agents required to optmize the codebase and measured against expert developer commits.
📰 News
- [Dec 23, 2025]: Released evaluation logs and transcripts w/ Docent support: gso-bench/gso-experiments.
- [Nov 3, 2025]: Released GSO's HackDetector that catches models reward hacking: GSO Blog.
- [May 30, 2025]: 🤗 GSO dataset is now available on HuggingFace! Access it at gso-bench/gso.
- [May 30, 2025]: Prebuilt docker images for GSO tasks are now available on Docker Hub.
- [May 30, 2025]: Initial release of the GSO benchmark: gso-bench.github.io
👋 Overview
GSO evaluates language models on software performance optimization. Each task provides:
- A codebase with a specific performance bottleneck
- A performance test as a precise specification
- An agent must generate a patch that improves runtime efficiency
- Success is measured against expert developer optimizations
To access GSO, copy and run the following code:
from datasets import load_dataset
gso = load_dataset('gso-bench/gso', split='test')
🚀 Setup
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
git clone --recursive https://github.com/gso-bench/gso.git
cd gso && uv venv && source .venv/bin/activate
uv sync
(Additional) Setup HuggingFace token:
export HF_TOKEN="huggingface_token"
💽 Usage
Evaluation Harness
- Building Dockers for GSO tasks:
docker login
uv run src/gso/harness/prepare_images.py \
--push_to_registry True \
--dockerhub_username <dockerhub_username> \
--dockerhub_repo <dockerhub_repo>
- Running Evaluations:
uv run src/gso/harness/opt_at_k.py \
--prediction_paths <prediction_path> \
--timeout 3600 \
--run_id <run_id> \
--k 1 \
--model <modelname>
For detailed instructions and options, see the Harness documentation.
GSO Collection Framework
The collection framework enables you to create your own GSO tasks through a four-step pipeline:
- Commit Extraction & Filtering: Extract performance-related commits using LLMs
- API Identification: Identify affected high-level APIs for each commit
- Performance Test Generation: Generate tests for API-Commit pairs
- Test Execution: Execute tests to identify performance improvements
For detailed instructions and usage, see the Collection Framework documentation.
⬇️ Artifacts
| Datasets | Tools | Dockers |
|---|---|---|
| 💿 GSO | 🔧 Evaluation Harness | 🐳 Docker Hub |
| 🔧 Collection Framework |
💫 Contributions
We welcome contributions from the broader NLP, Machine Learning, and Software Engineering research communities! Please file a new pull request or issue and fill in the corresponding templates accordingly.
✍️ Citation & license
MIT license. Check LICENSE file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gsobench-0.1.5.tar.gz.
File metadata
- Download URL: gsobench-0.1.5.tar.gz
- Upload date:
- Size: 235.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4b7742ad24ea6e64c57040f6feaa0205c1094d2c823392e40842b3adba10c8a
|
|
| MD5 |
df99849180f35066064cbdb18950fb9b
|
|
| BLAKE2b-256 |
2688045f701b5692b625bc3c669335ff9fc3b2b57a28195d29f7020239e5487f
|
Provenance
The following attestation bundles were made for gsobench-0.1.5.tar.gz:
Publisher:
publish.yml on gso-bench/gso
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsobench-0.1.5.tar.gz -
Subject digest:
b4b7742ad24ea6e64c57040f6feaa0205c1094d2c823392e40842b3adba10c8a - Sigstore transparency entry: 942421181
- Sigstore integration time:
-
Permalink:
gso-bench/gso@85ec9eaec02895d81fc1cefbbef3c07a26a68526 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/gso-bench
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85ec9eaec02895d81fc1cefbbef3c07a26a68526 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gsobench-0.1.5-py3-none-any.whl.
File metadata
- Download URL: gsobench-0.1.5-py3-none-any.whl
- Upload date:
- Size: 137.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53bda8f3ee6e9f8e758be1bcd24acfa8a115199c6b5384ba9e7481d7c3e09332
|
|
| MD5 |
ee0eaa704a2e38107ed22b3d3cadd342
|
|
| BLAKE2b-256 |
1779e8b5b8016ea87b139bd02e1d283e4a3f8b2d9c6073a147bad8a21f90705c
|
Provenance
The following attestation bundles were made for gsobench-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on gso-bench/gso
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsobench-0.1.5-py3-none-any.whl -
Subject digest:
53bda8f3ee6e9f8e758be1bcd24acfa8a115199c6b5384ba9e7481d7c3e09332 - Sigstore transparency entry: 942421186
- Sigstore integration time:
-
Permalink:
gso-bench/gso@85ec9eaec02895d81fc1cefbbef3c07a26a68526 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/gso-bench
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@85ec9eaec02895d81fc1cefbbef3c07a26a68526 -
Trigger Event:
release
-
Statement type: