Forge Harbor task directories from any evaluation benchmark
Project description
HarborForge
Forge Harbor task directories from any evaluation benchmark.
HarborForge provides the abstract contracts (DataMapper, DatasetHandler) for turning raw benchmark datasets into Harbor-compatible task directories, enabling large-scale parallel agent evaluation.
How it works
Raw benchmark data
↓ DataMapper.map()
Harbor task directories
↓ harbor jobs start
Agent runs in isolated Docker container
↓
Verifier scores the output → reward written to Harbor
Each task directory contains an instruction.md (shown to the agent), a Dockerfile (the agent's environment), and a test.sh verifier that writes a float reward to /logs/verifier/reward.txt.
Installation
pip install harborforge
Or with uv:
uv add harborforge
Usage
Implement DatasetHandler for each dataset type in your benchmark, then DataMapper to iterate over tasks:
from harborforge import DataMapper, DatasetHandler
class MyHandler(DatasetHandler):
dataset_name = "my_dataset"
def instruction(self, task_data):
return f"Solve this: {task_data['problem']}"
def dockerfile(self, task_data):
return "FROM python:3.12-slim\nWORKDIR /app\n"
def test_sh(self, task_data):
answer = task_data["answer"]
return f"""#!/bin/bash
mkdir -p /logs/verifier
actual=$(cat /output/answer.txt 2>/dev/null)
[ "$actual" = "{answer}" ] && echo 1 || echo 0 > /logs/verifier/reward.txt
"""
class MyMapper(DataMapper):
def iter_tasks(self):
for i, task in enumerate(load_my_benchmark()):
yield f"my_dataset/{i}", f"my_dataset/{i}", MyHandler(), task
# Generate Harbor task directories
MyMapper().run(output_dir=Path(".data/tasks"), registry_path=Path("registry.json"))
Handler contract
| Method | Required | Purpose |
|---|---|---|
instruction(task_data) |
✅ | Content for instruction.md — no answer leakage |
test_sh(task_data) |
✅ | Content for tests/test.sh — must write float reward to /logs/verifier/reward.txt |
dockerfile(task_data) |
✅ | Content for environment/Dockerfile |
setup() |
optional | Download/prepare data for this dataset |
data_files(task_data) |
optional | Local files to COPY into the image build context |
artifacts() |
optional | Container paths to capture after trial |
verifier_env_keys() |
optional | Env var keys to forward to the SEPARATE verifier |
verifier_dockerfile(task_data) |
optional | Non-None triggers SEPARATE verifier mode |
Reference implementation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harborforge-1.0.2.tar.gz.
File metadata
- Download URL: harborforge-1.0.2.tar.gz
- Upload date:
- Size: 93.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
288e678f97b9161c2af98b4e44a027b499f8616cb172b1a42b08133c097e68e6
|
|
| MD5 |
d41757d778c57f4eaf7c015886b1df55
|
|
| BLAKE2b-256 |
3c26d99ab47d64ab8f4e0f50b77d236d019afaa93f33ffb54b1bdc6fb8bbcf0d
|
File details
Details for the file harborforge-1.0.2-py3-none-any.whl.
File metadata
- Download URL: harborforge-1.0.2-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76eaeeec2ca22fc067786d8758bf9bb0100dfdf6bb3e15d6d73d4facf9d0c725
|
|
| MD5 |
4cfab38b771510b77874e14f72c7277b
|
|
| BLAKE2b-256 |
cebcab21f02feb042fd0fb937c55a71583b701ad95208e73496d12d4a26c3562
|