AWS EKS/Fargate execution backend for Harbor benchmarks
Project description
harbor-aws
AWS EKS/Fargate execution backend for Harbor benchmarks.
- Low infrastructure overhead: One line AWS infrastructure creation and destroy.
- High-concurrency execution: Run Harbor benchmarks at max concurrency on AWS.
- Pay-on-demand execution: Ensure cost scales with benchmark demand.
Install
uv sync --extra cdk
Quick Start
# Deploy infrastructure (one-time, ~15 min)
uv run python -m harbor_aws deploy
# Run benchmarks
uv run harbor run -c job-config.yaml \
-d terminal-bench@2.0 \
-a terminus-2 \
-m bedrock/converse/moonshotai.kimi-k2.5 \
-n 89
# Clean up
uv run python -m harbor_aws stop # delete pods, keep infra
uv run python -m harbor_aws destroy # delete everything
Prerequisites: AWS account with admin access. Docker Hub login (
docker login) recommended to avoid anonymous pull rate limits.
Scaling
Image pulls are capped at 50 concurrent operations by default to avoid Docker Hub rate limiting. For higher sustained concurrency, configure Amazon ECR pull-through cache for Docker Hub images.
ECR pull-through cache setup
During deploy, you'll be prompted to provide Docker Hub credentials. If provided, the deploy will automatically create the Secrets Manager secret and ECR cache rule.
To set it up manually instead:
aws secretsmanager create-secret \
--name ecr-pullthroughcache/docker-hub \
--secret-string '{"username":"YOUR_DOCKERHUB_USER","accessToken":"YOUR_ACCESS_TOKEN"}' \
--region us-east-1
Then enable in job config:
environment:
import_path: "harbor_aws.adapter:AWSEnvironment"
kwargs:
stack_name: harbor-aws
region: us-east-1
ecr_cache: true
Validation
Benchmarks reproduced from the Kimi K2.5 technical report using Kimi K2.5 on Amazon Bedrock with terminus-2.
| Benchmark | Official | harbor-aws |
|---|---|---|
| SWE-bench Verified | 76.8% | 71.5% |
| Terminal-Bench 2.0 | 50.8% | 43.8% |
| GPQA-Diamond | 87.6% | 79.8% |
| LiveCodeBench v6 | 85.0% | 88.6% |
Score gaps are expected — official results used Kimi's internal agent for some benchmarks, while we use terminus-2 throughout.
Documentation
- System Architecture & Design Principles — architecture overview, tradeoffs, and design rationale
Development
uv sync --extra dev --extra cdk
uv run ruff check src/
uv run mypy src/
License
Apache License 2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harbor_aws-0.2.1.tar.gz.
File metadata
- Download URL: harbor_aws-0.2.1.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f3ea916648345b5dd9dffb27a85b1739542a6c6b32dc016b6d15d461b586d4d
|
|
| MD5 |
02f8fb1e02e5cc8a4f75ce9c96f77f5c
|
|
| BLAKE2b-256 |
ab505ede3d2de648719e028c8bb0da23fddedbac8a16c963a0cd083c7b159f77
|
File details
Details for the file harbor_aws-0.2.1-py3-none-any.whl.
File metadata
- Download URL: harbor_aws-0.2.1-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11ef639e539e21d3d0cb3576413c43255704d175b20471516070e4b28bcf8910
|
|
| MD5 |
a03efee84eaf2a38411b6fee2c88fbb8
|
|
| BLAKE2b-256 |
8cfbfd8bf06fe740d6ae8cf908a844f3bf8944bd3d99627c8f60ba0b15a40148
|