Ocarina Labs' Harbor extension — adversarial probes for benchmark integrity.
Project description
ocarinalabs-harbor-ext
Ocarina Labs' Harbor extension. Adversarial probes for benchmark integrity, packaged for stock harbor>=0.5.0.
These probes implement the four adversarial archetypes from UC Berkeley's "How we broke top AI agent benchmarks" audit — null-agent, output-echoing, judge-injection, and state-tampering — as loadable Harbor agents. A benchmark that credits any of these above-floor is not publication-ready.
Shipped separately from ocarinalabs/quaver so that any Harbor-based benchmark author can attest their world against the same gate.
Install
pip install ocarinalabs-harbor-ext
# or
uv add ocarinalabs-harbor-ext
Probes
Four probes. Each is a standalone class extending harbor.agents.base.BaseAgent. A benchmark passes the gate when all four probes score at or below the configured floor (typically 0.0).
| Probe | Berkeley pattern | What it does |
|---|---|---|
pattern_1.QuaverPattern1 |
Pattern 1 — isolation | Attempts host-mount escapes, proc-filesystem tricks, rubric exfiltration, suid escalation |
pattern_4.QuaverPattern4 |
Pattern 4 — judge injection | Writes judge-prompt-injection payloads into agent-controlled paths |
pattern_5.QuaverPattern5 |
Pattern 5 — weak matching | Writes pass-tokens (PASS, SUCCESS, yes, 1, true, OK) to standard output paths |
pattern_7.QuaverPattern7 |
Pattern 7 — trusting untrusted | Writes directly to verifier state (reward.txt, reward.json, trajectory.json) |
The stock Harbor nop agent covers the null-agent archetype. Use it alongside these four for full coverage.
Run against a benchmark
# one probe at a time, via Harbor's import-path mechanism
harbor run /path/to/world \
--agent-import-path ocarinalabs_harbor_ext.probes.pattern_1:QuaverPattern1
harbor run /path/to/world \
--agent-import-path ocarinalabs_harbor_ext.probes.pattern_4:QuaverPattern4
harbor run /path/to/world \
--agent-import-path ocarinalabs_harbor_ext.probes.pattern_5:QuaverPattern5
harbor run /path/to/world \
--agent-import-path ocarinalabs_harbor_ext.probes.pattern_7:QuaverPattern7
# also the stock null-agent baseline
harbor run /path/to/world --agent nop
Or orchestrate the full cascade declaratively via Harbor's harbor jobs start -c job.yaml:
job_name: quaver-adversarial-gate
n_attempts: 1
orchestrator:
n_concurrent_trials: 5
environment:
type: docker
agents:
- name: probe-nop
name: nop
- name: probe-isolation
import_path: ocarinalabs_harbor_ext.probes.pattern_1:QuaverPattern1
- name: probe-injection
import_path: ocarinalabs_harbor_ext.probes.pattern_4:QuaverPattern4
- name: probe-echo
import_path: ocarinalabs_harbor_ext.probes.pattern_5:QuaverPattern5
- name: probe-tamper
import_path: ocarinalabs_harbor_ext.probes.pattern_7:QuaverPattern7
tasks:
- path: /path/to/world
What each probe signals when it scores above floor
Pattern 1 (isolation breach) — the container boundary between agent and host leaks. An agent can write /host/... paths, walk /proc/1/root, or escalate suid. A world that credits this probe has lost structural isolation; failures here compromise every other safety claim.
Pattern 4 (judge injection) — the rubric's LLM-judge consumes agent-controlled text without structural delimiters. A world that credits this probe is coaching its own grader. Fix: wrap agent bytes in <agent_output>...</agent_output> with instructions to ignore directives inside.
Pattern 5 (weak matching) — the rubric credits any non-empty output or uses substring matching for output-file checks. A world that credits this probe has a non-discriminating capability signal. Fix: replace substring checks with structural equality or semantic comparison.
Pattern 7 (verifier tamper) — the agent can write to /logs/verifier/reward.txt, /tests/, or other paths the verifier trusts as canonical. A world that credits this probe has a permission drift. Fix: evaluator-owned paths must be root-owned and agent-read-only.
Developer quickstart
git clone https://github.com/ocarinalabs/harbor-ext
cd harbor-ext
uv sync --extra dev
uv run pytest
Related
- ocarinalabs/quaver — benchmark world generator that uses these probes as a pre-publication gate
- harbor-framework/harbor — the runtime these probes plug into
- Berkeley "How we broke top AI agent benchmarks" — the empirical case for each probe
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ocarinalabs_harbor_ext-0.1.0.tar.gz.
File metadata
- Download URL: ocarinalabs_harbor_ext-0.1.0.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4d9e28a671cbcc4a50af983b389c80dc3a171aed93a6acd7045f39fe7640d25
|
|
| MD5 |
3ff88fc4b8a50085345cbccbf5964e51
|
|
| BLAKE2b-256 |
f07cc16b92f82672fa4159236cba9e4febe107ad814ee457181aa50859a82d5f
|
File details
Details for the file ocarinalabs_harbor_ext-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ocarinalabs_harbor_ext-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6bc4769c2b85d1b739898dc0096b12d8c59875ba656ac91935114cab6ca02aa
|
|
| MD5 |
244ac0a8e0b42bf06844093a03300a13
|
|
| BLAKE2b-256 |
8a37b2b852a439f65db0fd6812a18cc7ff40da3a63f7af048856418bd12f611f
|