Run coding agents (Claude Code, Codex, Aider, OpenHands, …) against prompts in container sandboxes via Harbor
Project description
aicraft
Run coding agents (Claude Code, Codex, Aider, OpenHands, …) against prompts in container sandboxes. A thin Python library + CLI on top of the Harbor framework, focused on ad-hoc single-prompt runs rather than full benchmark evaluation.
Status: alpha. API and CLI may shift between 0.1.x releases. Pin minor versions if you depend on it.
Why this exists
Harbor itself is built around full benchmark lifecycles: scaffolding a task directory, running an agent over a dataset, scoring with a verifier, publishing trajectories. That's heavy when all you want is "run agent X on prompt Y, give me the answer."
aicraft flips the abstraction:
- Inline prompt or
--prompt-fileinstead of a hand-built task directory - Verifier disabled by default
- Auto-synthesizes the minimal Harbor task structure under the hood
- ATIF-aware final-text extraction (works across any agent that emits ATIF)
- Mount allowlist enforcement (refuses to mount paths outside configured roots)
- Provider presets (
--provider openrouteretc.) for routing OpenAI-protocol agents through aggregator gateways - Workarounds for current Harbor 0.4 quirks (Codex
OPENAI_BASE_URLpropagation)
It's the opinionated, ad-hoc-friendly subset of Harbor — like gh is to a git push origin <branch> && curl github.com/... chain.
Install
pip install aicraft
aicraft requires Python 3.12+ and a working Docker daemon (or Podman with the Docker compatibility socket — set DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock). Harbor itself spawns each agent in a Docker container.
Usage
Library
import asyncio
from pathlib import Path
from aicraft import AgentRunner, AgentConfig, MountSpec
async def main():
runner = AgentRunner()
result = await runner.run(AgentConfig(
prompt="Summarize the structure of /workspace/code in one paragraph.",
agent="claude-code",
mounts=[MountSpec(host=Path("/data/repo"), container=Path("/workspace/code"))],
timeout_s=600,
))
print(result.status) # "completed" | "timeout" | "error"
print(result.final_text) # agent's textual reply (extracted from ATIF)
print(result.trajectory_path) # full trajectory on disk
asyncio.run(main())
CLI
# Inline prompt
aicraft run -a claude-code "Find dead code in this repo"
# Prompt from a file
aicraft run -a codex -M gpt-5 -f ./prompt.md
# With a mounted code directory
AICRAFT_MOUNT_ROOTS=/data/repos aicraft run -a claude-code \
"Refactor the parser for readability" \
--mount /data/repos/foo:/workspace/code:ro
# Specify model and a higher timeout, write structured result to file
aicraft run -a claude-code "..." \
--model claude-sonnet-4-7 \
--timeout 1800 \
--output ./result.json
# Route an OpenAI-protocol agent through OpenRouter for open-weight models
OPENROUTER_API_KEY=sk-or-... aicraft run \
-a codex -M deepseek/deepseek-chat --provider openrouter \
"Describe the bug in /workspace/buggy.py and fix it" \
--mount /tmp/scratch:/workspace:rw
# See installed agents
aicraft list-agents
# Locate a previously-run trajectory on disk
aicraft trajectory aicraft-3a2b1c0d9e8f
Mount syntax: host:container[:ro|rw] — default is ro. Pass :rw explicitly when the agent needs to write back.
Output
aicraft run prints a JSON document to stdout for pipeline use:
{
"status": "completed",
"final_text": "def reverse_string(s: str) -> str:\n return s[::-1]",
"trajectory_path": "/path/to/trajectories/aicraft-7dad38f46662",
"duration_s": 65.1,
"trial_id": "aicraft-7dad38f46662",
"error": null
}
…and a human-readable banner to stderr so the agent's reply is easy to spot amid debug logs:
════════════════════════════════════════════════════════════════════════
AGENT OUTPUT [OK] trial=aicraft-7dad38f46662 65.1s
════════════════════════════════════════════════════════════════════════
def reverse_string(s: str) -> str:
return s[::-1]
════════════════════════════════════════════════════════════════════════
Exit codes: 0 on completed, 1 on timeout / error, 2 on a pre-flight config issue (mount not allowed, missing required model, unknown provider).
The trajectory directory contains everything Harbor captured: the ATIF trajectory.json, the agent's raw session logs, and any verifier artifacts. For an aicraft run there's no verifier output, but the ATIF trajectory has the full tool-call history.
Configuration
| Variable | Purpose | Default |
|---|---|---|
AICRAFT_MOUNT_ROOTS |
Colon-separated absolute host paths permitted as mount sources. Empty means no mounts allowed. | empty |
AICRAFT_TRAJECTORY_DIR |
Where captured trajectories are stored. Override per-call with --trajectory-dir. |
./trajectories/ (cwd) |
OPENAI_API_KEY, OPENAI_BASE_URL, ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL, GEMINI_API_KEY, GOOGLE_API_KEY, CLAUDE_CODE_OAUTH_TOKEN |
Standard provider env vars consumed by Harbor agents. | — |
OPENROUTER_API_KEY, FIREWORKS_API_KEY, TOGETHER_API_KEY, GROQ_API_KEY |
Provider-specific keys consumed by --provider <name>. |
— |
Provider presets
--provider <name> rewrites OPENAI_BASE_URL and OPENAI_API_KEY from a friendly name plus a provider-specific env var, so OpenAI-protocol agents (codex, copilot-cli) can route through aggregator gateways without manual env juggling. Anthropic-protocol agents (claude-code) read ANTHROPIC_* directly and need separate setup.
| Preset | Base URL | Key env var |
|---|---|---|
openrouter |
https://openrouter.ai/api/v1 |
OPENROUTER_API_KEY |
fireworks |
https://api.fireworks.ai/inference/v1 |
FIREWORKS_API_KEY |
together |
https://api.together.xyz/v1 |
TOGETHER_API_KEY |
groq |
https://api.groq.com/openai/v1 |
GROQ_API_KEY |
Limitations
See GOTCHAS.md for the full list with reproduction steps and workarounds. Highlights:
- First run per agent is slow — Harbor 0.4 re-installs the agent CLI in a fresh container on every trial (~45–55s for claude-code/codex). Image build is cached; container is fresh.
- Rootless Podman + claude-code — the agent runs fine, but trajectory ingestion fails reading session JSONL files (claude-code writes them at
0600, and rootless Podman's userns mapping makes them unreadable from the host). Use Docker for claude-code, or other agents (codex, aider, nop) on Podman — they're unaffected. final_textfor non-ATIF agents — may be empty; the trajectory directory still has the full record.- Codex requires an explicit
--model— Harbor's codex agent has no default. aicraft pre-validates this in <1s. --provideris OpenAI-protocol only — claude-code (Anthropic protocol) needsANTHROPIC_BASE_URL+ANTHROPIC_API_KEYset manually for gateways like OpenRouter.
Related issues filed upstream
- harbor#1514 — Defer litellm/aiohttp imports to avoid ~25s startup overhead
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aicraft-0.1.0.tar.gz.
File metadata
- Download URL: aicraft-0.1.0.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
737abbfac75052324c273e567f4cc7277cfb797e56510dbf0ea9cbf8170e14fd
|
|
| MD5 |
f28410ce1b9f6322a37e87fa10f788b1
|
|
| BLAKE2b-256 |
582886fbcd6dffb1ddf94f8105b0ee89d3d0d872e60a2ac0030d8da242573ba6
|
Provenance
The following attestation bundles were made for aicraft-0.1.0.tar.gz:
Publisher:
publish.yml on kashifpk/aicraft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aicraft-0.1.0.tar.gz -
Subject digest:
737abbfac75052324c273e567f4cc7277cfb797e56510dbf0ea9cbf8170e14fd - Sigstore transparency entry: 1383637351
- Sigstore integration time:
-
Permalink:
kashifpk/aicraft@9bade43a3b763363738ef3e214af163bdffabb3f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kashifpk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9bade43a3b763363738ef3e214af163bdffabb3f -
Trigger Event:
push
-
Statement type:
File details
Details for the file aicraft-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aicraft-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00192c8b4481f12c2a67f2d41f09fdc2cf1789a89206b49361e950b7ad61cfcc
|
|
| MD5 |
0905c8a658f015f3d384e3812b540a68
|
|
| BLAKE2b-256 |
b7d16cabcd6f08dc14fbab3a667c516f5bd2ffa043d32fdc264860b5340cbf3e
|
Provenance
The following attestation bundles were made for aicraft-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on kashifpk/aicraft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aicraft-0.1.0-py3-none-any.whl -
Subject digest:
00192c8b4481f12c2a67f2d41f09fdc2cf1789a89206b49361e950b7ad61cfcc - Sigstore transparency entry: 1383637399
- Sigstore integration time:
-
Permalink:
kashifpk/aicraft@9bade43a3b763363738ef3e214af163bdffabb3f -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/kashifpk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9bade43a3b763363738ef3e214af163bdffabb3f -
Trigger Event:
push
-
Statement type: