Skip to main content

AWS EKS/Fargate execution backend for Harbor benchmarks

Project description

harbor-aws

License

AWS EKS/Fargate execution backend for Harbor benchmarks.

  • Low infrastructure overhead: One line AWS infrastructure creation and destroy.
  • High-concurrency execution: Run Harbor benchmarks at max concurrency on AWS.
  • Pay-on-demand execution: Ensure cost scales with benchmark demand.

Architecture

Install

uv sync --extra cdk

Quick Start

# Deploy infrastructure (one-time, ~15 min)
uv run python -m harbor_aws deploy

# Run benchmarks
uv run harbor run -c job-config.yaml \
  -d terminal-bench@2.0 \
  -a terminus-2 \
  -m bedrock/converse/moonshotai.kimi-k2.5 \
  -n 89

# Clean up
uv run python -m harbor_aws stop      # delete pods, keep infra
uv run python -m harbor_aws destroy   # delete everything

Prerequisites: AWS account with admin access. Docker Hub login (docker login) recommended to avoid anonymous pull rate limits.

Scaling

Image pulls are capped at 50 concurrent operations by default to avoid Docker Hub rate limiting. For higher sustained concurrency, configure Amazon ECR pull-through cache for Docker Hub images.

ECR pull-through cache setup

During deploy, you'll be prompted to provide Docker Hub credentials. If provided, the deploy will automatically create the Secrets Manager secret and ECR cache rule.

To set it up manually instead:

aws secretsmanager create-secret \
  --name ecr-pullthroughcache/docker-hub \
  --secret-string '{"username":"YOUR_DOCKERHUB_USER","accessToken":"YOUR_ACCESS_TOKEN"}' \
  --region us-east-1

Then enable in job config:

environment:
  import_path: "harbor_aws.adapter:AWSEnvironment"
  kwargs:
    stack_name: harbor-aws
    region: us-east-1
    ecr_cache: true

Validation

Benchmarks reproduced from the Kimi K2.5 technical report using Kimi K2.5 on Amazon Bedrock with terminus-2.

Benchmark Official harbor-aws
SWE-bench Verified 76.8% 71.5%
Terminal-Bench 2.0 50.8% 43.8%
GPQA-Diamond 87.6% 79.8%
LiveCodeBench v6 85.0% 88.6%

Score gaps are expected — official results used Kimi's internal agent for some benchmarks, while we use terminus-2 throughout.

Documentation

Development

uv sync --extra dev --extra cdk
uv run ruff check src/
uv run mypy src/

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harbor_aws-0.2.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harbor_aws-0.2.1-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file harbor_aws-0.2.1.tar.gz.

File metadata

  • Download URL: harbor_aws-0.2.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for harbor_aws-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5f3ea916648345b5dd9dffb27a85b1739542a6c6b32dc016b6d15d461b586d4d
MD5 02f8fb1e02e5cc8a4f75ce9c96f77f5c
BLAKE2b-256 ab505ede3d2de648719e028c8bb0da23fddedbac8a16c963a0cd083c7b159f77

See more details on using hashes here.

File details

Details for the file harbor_aws-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: harbor_aws-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for harbor_aws-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 11ef639e539e21d3d0cb3576413c43255704d175b20471516070e4b28bcf8910
MD5 a03efee84eaf2a38411b6fee2c88fbb8
BLAKE2b-256 8cfbfd8bf06fe740d6ae8cf908a844f3bf8944bd3d99627c8f60ba0b15a40148

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page