Skip to main content

Forge Harbor task directories from any evaluation benchmark

Project description

HarborForge

CI Downloads

Forge Harbor task directories from any evaluation benchmark.

HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.

How it works

Raw benchmark data
      ↓  DataMapper.map()
Harbor task directories
      ↓  harbor jobs start
Agent runs in isolated Docker container
      ↓
Verifier scores the output → reward written to Harbor

Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.

Installation

pip install harborforge

Or with uv:

uv add harborforge

Usage

Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:

from harborforge import DataMapper, DatasetHandler

class MyHandler(DatasetHandler):
    dataset_name = "my_dataset"

    def instruction(self, task_data):
        return f"Solve this: {task_data['problem']}"

    def dockerfile(self, task_data):
        return "FROM python:3.12-slim\nWORKDIR /app\n"

    def test_sh(self, task_data):
        answer = task_data["answer"]
        return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""

class MyMapper(DataMapper):
    def iter_tasks(self):
        for i, task in enumerate(load_my_benchmark()):
            yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task

# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))

Handler contract

Method Required Purpose
instruction(task_data) Content for instruction.md — no answer leakage
test_sh(task_data) Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt
dockerfile(task_data) Content for environment/Dockerfile
setup() optional Download/prepare data for this dataset
data_files(task_data) optional Local files to COPY into the image build context
artifacts() optional Container paths to capture after trial
verifier_env_keys() optional Env var keys to forward to the SEPARATE verifier
verifier_dockerfile(task_data) optional Non-None triggers SEPARATE verifier mode

Reference implementation

g9 — maps DSGym benchmarks to Harbor using HarborForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harborforge-1.0.3.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harborforge-1.0.3-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file harborforge-1.0.3.tar.gz.

File metadata

  • Download URL: harborforge-1.0.3.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.3.tar.gz
Algorithm Hash digest
SHA256 d9175b52afd865108a4b57f952cfdd7120f7fd2819ba3565c9742d698a43fede
MD5 5e846c1e45f05b90a981cc848846da4d
BLAKE2b-256 07341563d5ec4786c7fa3972f696aa53602715007aab46b33f1a5b2d9c1b523a

See more details on using hashes here.

File details

Details for the file harborforge-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: harborforge-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a42d5d7d751b588497b98d2816814bad1ee1127eee29f02993e14f361e8da4f3
MD5 7e726471fc94377669da7cff379855c0
BLAKE2b-256 3f540fe86610e11640ea4981608f8b490f1fcfe056f368c1fcdc91b7a28ffbc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page