Skip to main content

Forge Harbor task directories from any evaluation benchmark

Project description

HarborForge

CI Downloads

Forge Harbor task directories from any evaluation benchmark.

HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.

How it works

Raw benchmark data
      ↓  DataMapper.map()
Harbor task directories
      ↓  harbor jobs start
Agent runs in isolated Docker container
      ↓
Verifier scores the output → reward written to Harbor

Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.

Installation

pip install harborforge

Or with uv:

uv add harborforge

Usage

Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:

from harborforge import DataMapper, DatasetHandler

class MyHandler(DatasetHandler):
    dataset_name = "my_dataset"

    def instruction(self, task_data):
        return f"Solve this: {task_data['problem']}"

    def dockerfile(self, task_data):
        return "FROM python:3.12-slim\nWORKDIR /app\n"

    def test_sh(self, task_data):
        answer = task_data["answer"]
        return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""

class MyMapper(DataMapper):
    def iter_tasks(self):
        for i, task in enumerate(load_my_benchmark()):
            yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task

# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))

Handler contract

Method Required Purpose
instruction(task_data) Content for instruction.md — no answer leakage
test_sh(task_data) Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt
dockerfile(task_data) Content for environment/Dockerfile
setup() optional Download/prepare data for this dataset
data_files(task_data) optional Local files to COPY into the image build context
artifacts() optional Container paths to capture after trial
verifier_env_keys() optional Env var keys to forward to the SEPARATE verifier
verifier_dockerfile(task_data) optional Non-None triggers SEPARATE verifier mode

Reference implementation

g9 — maps DSGym benchmarks to Harbor using HarborForge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harborforge-1.0.2.tar.gz (93.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harborforge-1.0.2-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file harborforge-1.0.2.tar.gz.

File metadata

  • Download URL: harborforge-1.0.2.tar.gz
  • Upload date:
  • Size: 93.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.2.tar.gz
Algorithm Hash digest
SHA256 288e678f97b9161c2af98b4e44a027b499f8616cb172b1a42b08133c097e68e6
MD5 d41757d778c57f4eaf7c015886b1df55
BLAKE2b-256 3c26d99ab47d64ab8f4e0f50b77d236d019afaa93f33ffb54b1bdc6fb8bbcf0d

See more details on using hashes here.

File details

Details for the file harborforge-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: harborforge-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for harborforge-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 76eaeeec2ca22fc067786d8758bf9bb0100dfdf6bb3e15d6d73d4facf9d0c725
MD5 4cfab38b771510b77874e14f72c7277b
BLAKE2b-256 cebcab21f02feb042fd0fb937c55a71583b701ad95208e73496d12d4a26c3562

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page