Skip to main content

Forge Harbor task directories from any evaluation benchmark

Project description

HarborForge

CI Downloads

Forge Harbor task directories from any evaluation benchmark.

HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.

How it works

Raw benchmark data
      ↓  DataMapper.map()
Harbor task directories
      ↓  harbor jobs start
Agent runs in isolated Docker container
      ↓
Verifier scores the output → reward written to Harbor

Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.

Installation

pip install harborforge

Or with uv:

uv add harborforge

Usage

Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:

from harborforge import DataMapper, DatasetHandler

class MyHandler(DatasetHandler):
    dataset_name = "my_dataset"

    def instruction(self, task_data):
        return f"Solve this: {task_data['problem']}"

    def dockerfile(self, task_data):
        return "FROM python:3.12-slim\nWORKDIR /app\n"

    def test_sh(self, task_data):
        answer = task_data["answer"]
        return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""

class MyMapper(DataMapper):
    def iter_tasks(self):
        for i, task in enumerate(load_my_benchmark()):
            yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task

# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))

Handler contract

Method Required Purpose
instruction(task_data) Content for instruction.md — no answer leakage
test_sh(task_data) Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt
dockerfile(task_data) Content for environment/Dockerfile
setup() optional Download/prepare data for this dataset
data_files(task_data) optional Local files to COPY into the image build context
artifacts() optional Container paths to capture after trial
verifier_env_keys() optional Env var keys to forward to the SEPARATE verifier
verifier_dockerfile(task_data) optional Non-None triggers SEPARATE verifier mode

Reference implementation

g9 — maps DSGym benchmarks to Harbor using HarborForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harborforge-1.0.5.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harborforge-1.0.5-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file harborforge-1.0.5.tar.gz.

File metadata

  • Download URL: harborforge-1.0.5.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.5.tar.gz
Algorithm Hash digest
SHA256 c324f690ecdb9c9843d6e3a1305afb63080997d866e0a8eb8ca545dc1c1b385c
MD5 10f45d9d7a14e6a0667a4fc9d81f35f1
BLAKE2b-256 8640a3487577a66c896dd6341b9815340645e9dc0a0f2b7a10e3930f93f3a354

See more details on using hashes here.

File details

Details for the file harborforge-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: harborforge-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5f473fce028d39ef01c6627e547fa6a37a3fe71bf12b5a19af46682bba0d3cce
MD5 cb44d22acf320b95f398f5747933b716
BLAKE2b-256 30220ba37b09737b0c11a4c15fa3de27489b79187d15c390dbf9c36fe9c6162b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page