Submit predictions to the SWE-bench API and manage your runs
Project description
SWE-bench CLI
A command-line interface for interacting with the SWE-bench API. Use this tool to submit predictions, manage runs, and retrieve evaluation reports.
Read the full documentation here. For submission guidelines, see here.
Installation
pip install sb-cli
Authentication
Before using the CLI, you'll need to get an API key:
- Generate an API key:
sb-cli gen-api-key your.email@example.com
- Set your API key as an environment variable - and store it somewhere safe!
export SWEBENCH_API_KEY=your_api_key
# or add export SWEBENCH_API_KEY=your_api_key to your .*rc file
- You'll receive an email with a verification code. Verify your API key:
sb-cli verify-api-key YOUR_VERIFICATION_CODE
Subsets and Splits
SWE-bench has different subsets and splits available:
Subsets
swe-bench-m: The main datasetswe-bench_lite: A smaller subset for testing and developmentswe-bench_verified: 500 verified problems from SWE-bench Learn more
Splits
dev: Development/validation splittest: Test split (currently only available forswe-bench_liteandswe-bench_verified)
You'll need to specify both a subset and split for most commands.
Usage
Submit Predictions
Submit your model's predictions to SWE-bench:
sb-cli submit swe-bench-m test \
--predictions_path predictions.json \
--run_id my_run_id
Options:
--run_id: ID of the run to submit predictions for (optional, defaults to the name of the parent directory of the predictions file)--instance_ids: Comma-separated list of specific instance IDs to submit (optional)--output_dir: Directory to save report files (default: sb-cli-reports)--overwrite: Overwrite existing report (default: 0)--gen_report: Generate a report after evaluation is complete (default: 1)
Get Report
Retrieve evaluation results for a specific run:
sb-cli get-report swe-bench-m dev my_run_id -o ./reports
List Runs
View all your existing run IDs for a specific subset and split:
sb-cli list-runs swe-bench-m dev
Predictions File Format
Your predictions file should be a JSON file in one of these formats:
{
"instance_id_1": {
"model_patch": "...",
"model_name_or_path": "..."
},
"instance_id_2": {
"model_patch": "...",
"model_name_or_path": "..."
}
}
Or as a list:
[
{
"instance_id": "instance_id_1",
"model_patch": "...",
"model_name_or_path": "..."
},
{
"instance_id": "instance_id_2",
"model_patch": "...",
"model_name_or_path": "..."
}
]
Submitting to the Multimodal Leaderboard
To submit your system to the SWE-bench Multimodal leaderboard:
- Submit your predictions for the
swe-bench-m/testsplit using the CLI - Fork the experiments repository
- Add your submission files under
experiments/multimodal/YOUR_MODEL_NAME/ - Create a PR with your submission
See the detailed guide in our submission documentation.
Note: Check your test split quota using sb-cli quota swe-bench-m test before submitting.
Related projects
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sb_cli-0.1.4.tar.gz.
File metadata
- Download URL: sb_cli-0.1.4.tar.gz
- Upload date:
- Size: 590.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66e3c81793ae8f7e8500ada79ed2504b7078353a3aca95221e7a4e695ce59c46
|
|
| MD5 |
32cfe609eb4bd864d6714441b72910bb
|
|
| BLAKE2b-256 |
d06083e1c47260d2a2cc498bb0850d2eca6ce7bae2db100fba53726dfc0ddd7e
|
File details
Details for the file sb_cli-0.1.4-py3-none-any.whl.
File metadata
- Download URL: sb_cli-0.1.4-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29996dde3c3eb61af67f69177c9aa701046b82f7e58efc59a5a28f93e85a209
|
|
| MD5 |
e7d22b325813ccb58212ca57514a573c
|
|
| BLAKE2b-256 |
a0fa7487e7198f084c573e795f928cb5f663a3e0ec2d49d3de351f90bac4eb9e
|