Skip to main content

Forge Harbor task directories from any evaluation benchmark

Project description

HarborForge

CI Downloads

Forge Harbor task directories from any evaluation benchmark.

HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.

How it works

Raw benchmark data
      ↓  DataMapper.map()
Harbor task directories
      ↓  harbor jobs start
Agent runs in isolated Docker container
      ↓
Verifier scores the output → reward written to Harbor

Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.

Installation

pip install harborforge

Or with uv:

uv add harborforge

Usage

Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:

from harborforge import DataMapper, DatasetHandler

class MyHandler(DatasetHandler):
    dataset_name = "my_dataset"

    def instruction(self, task_data):
        return f"Solve this: {task_data['problem']}"

    def dockerfile(self, task_data):
        return "FROM python:3.12-slim\nWORKDIR /app\n"

    def test_sh(self, task_data):
        answer = task_data["answer"]
        return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""

class MyMapper(DataMapper):
    def iter_tasks(self):
        for i, task in enumerate(load_my_benchmark()):
            yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task

# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))

Handler contract

Method Required Purpose
instruction(task_data) Content for instruction.md — no answer leakage
test_sh(task_data) Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt
dockerfile(task_data) Content for environment/Dockerfile
setup() optional Download/prepare data for this dataset
data_files(task_data) optional Local files to COPY into the image build context
artifacts() optional Container paths to capture after trial
verifier_env_keys() optional Env var keys to forward to the SEPARATE verifier
verifier_dockerfile(task_data) optional Non-None triggers SEPARATE verifier mode

Reference implementation

g9 — maps DSGym benchmarks to Harbor using HarborForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harborforge-1.0.1.tar.gz (93.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harborforge-1.0.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file harborforge-1.0.1.tar.gz.

File metadata

  • Download URL: harborforge-1.0.1.tar.gz
  • Upload date:
  • Size: 93.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.1.tar.gz
Algorithm Hash digest
SHA256 21e0439d763e51d3dff520c5ad806dbeb943bf923e024b26e8a5424752e1b31c
MD5 ceb4413ced9c9f91eb3b35051057964d
BLAKE2b-256 f4428e210411821a8413f4a860597727e28512f910d6353d1a1f142b3d55a9d0

See more details on using hashes here.

File details

Details for the file harborforge-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: harborforge-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 05b2bac9c517266115c6f86cd7945bbbfaaf3ab6c1a4aff9c0d17898c62a85a8
MD5 9c0d3f2e2fbda1ee2336ede0bb1bcab7
BLAKE2b-256 2970cd7dac6bc1eb888796c333cf500edffb778b1c51d29a9b83d08883255db3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page