Skip to main content

Forge Harbor task directories from any evaluation benchmark

Project description

HarborForge

CI Downloads

Forge Harbor task directories from any evaluation benchmark.

HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.

How it works

Raw benchmark data
      ↓  DataMapper.map()
Harbor task directories
      ↓  harbor jobs start
Agent runs in isolated Docker container
      ↓
Verifier scores the output → reward written to Harbor

Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.

Installation

pip install harborforge

Or with uv:

uv add harborforge

Usage

Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:

from harborforge import DataMapper, DatasetHandler

class MyHandler(DatasetHandler):
    dataset_name = "my_dataset"

    def instruction(self, task_data):
        return f"Solve this: {task_data['problem']}"

    def dockerfile(self, task_data):
        return "FROM python:3.12-slim\nWORKDIR /app\n"

    def test_sh(self, task_data):
        answer = task_data["answer"]
        return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""

class MyMapper(DataMapper):
    def iter_tasks(self):
        for i, task in enumerate(load_my_benchmark()):
            yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task

# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))

Handler contract

Method Required Purpose
instruction(task_data) Content for instruction.md — no answer leakage
test_sh(task_data) Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt
dockerfile(task_data) Content for environment/Dockerfile
setup() optional Download/prepare data for this dataset
data_files(task_data) optional Local files to COPY into the image build context
artifacts() optional Container paths to capture after trial
verifier_env_keys() optional Env var keys to forward to the SEPARATE verifier
verifier_dockerfile(task_data) optional Non-None triggers SEPARATE verifier mode

Reference implementation

g9 — maps DSGym benchmarks to Harbor using HarborForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harborforge-1.0.6.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harborforge-1.0.6-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file harborforge-1.0.6.tar.gz.

File metadata

  • Download URL: harborforge-1.0.6.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.6.tar.gz
Algorithm Hash digest
SHA256 fee10309e518876f3cb51745312e34cc66dbb4eecc81a0d1ac93a51a1fba3b48
MD5 6b3594086122863f62a131858e197109
BLAKE2b-256 9170018e59cfebfa775e2ba1d03a09c777e64fcf358c49b417c38869a6b6dd08

See more details on using hashes here.

File details

Details for the file harborforge-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: harborforge-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e1b37cc9557514a3d0ab99c0d1838c4873d0ea524d60db591511332354763f22
MD5 d2878c23dc0b85190cc9786b6b3dcbba
BLAKE2b-256 fd2c6de0fd956aa11055555d0ee2a3a9df1e20d851e2aa938423f3f98e7f588b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page