Opinionated, sandboxed Python coding agent.
Project description
agent6
A sandboxed coding agent for Anthropic Claude and OpenAI-compatible backends (OpenAI, OpenRouter, Ollama, vLLM, LM Studio, llama.cpp). Linux-only.
- LLM calls are wrapped in deterministic Python workflows with a fixed tool surface.
- Every child process runs inside
agent6-jail(Rust + Linux user namespaces + Landlock + seccomp). - The workflow's own git operations refuse
push,--force, and history rewrite; the sandbox additionally re-binds.git/,agent6.toml, and.agent6/read-only inside the jail so worker scripts cannot corrupt them. - No web UI, no plugin system, no telemetry, no auto-update.
Threat model
The worker model is treated as adversarial. It must not be able to:
- write outside the project's working directory
- read files outside the project (plus any sibling read-only paths)
- reach the network except the host:port of each
[providers.*]block (whensandbox.network = "provider_only") - corrupt the project's git history or its own configuration / run state from inside the sandbox
- leave background processes running after the run ends
Enforcement is layered, and the threat model bullets above map to layers as follows:
git_ops.pyrefusals (push,--force,reset --hard,branch -D, history rewrite) constrain only the workflow's own git calls from the agent process. They do not police what the worker does. See src/agent6/git_ops.py.- Tool surface (src/agent6/tools/schema.py).
The LLM cannot directly invoke
shellorwrite_file. It hasapply_edit,read_file,list_dir,grep,run_verify_command(operator-fixed argv), and — only whensandbox.run_commandsis"yes"or"ask"—run_command(argv). The first six tools cannot spawn an LLM-chosen subprocess. agent6-jailwraps every child command (verify, run_command, curator): fresh user/mount/pid/ipc/uts/net namespace, pivots into a minimal rootfs, applies Landlock, seccomp filter, drops capabilities,NO_NEW_PRIVS. In strict profile the network namespace is empty whensandbox.network != "allow"—git push,curl,pip install, DNS all fail with no route, even from an ad-hoc script that the worker writes and runs viarun_command.sandbox.protect_git+sandbox.protect_agent6(defaulttrue) make.git/,agent6.toml, and.agent6/read-only from the child's view. In strict they are re-bound RO on top of the workspace mount. In hardened (no mount namespace) the launcher switches its Landlock setup from "RW on cwd" to "R on cwd + RW on each top-level entry except the protect set" — same end result for paths that exist when the jail starts, at the cost of denying writes to new top-level entries created at the cwd root after launch (anything inside an existing top-level dir likesrc/still gets the full recursive RW rule). This closes the "worker writes a shell script that doesrm -rf .gitorgit reset --hard origin/main" loophole; the workflow's own commits go throughgit_ops.pyfrom outside the jail and are unaffected.- Landlock on the agent process itself further restricts what the agent's Python code can read/write outside the jail.
What is NOT protected:
- If you set
sandbox.run_commands = "yes"(or"ask"and approve) andsandbox.network = "allow", the worker can talk to anywhere on the public internet from inside the sandbox. Don't do this. - The protected paths are read-only inside the jail, but the worker can still create new files anywhere else under the cwd. A worker that wants to corrupt your project can still write garbage source code; that's what the reviewer + verify_command are for, not the sandbox.
See SECURITY.md for the per-layer breakdown and ARCHITECTURE.md for state machines of each workflow.
Architecture
┌───────────────────────────────────┐
│ agent6 CLI │
└───────────────┬───────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌───────▼───────┐ ┌─────────▼─────────┐ ┌──────▼──────┐
│ workflows/ │ │ agents/ │ │ graph/ │
│ implement │ │ planner, worker │ │ curator │
│ plan_mode │ │ critic, reviewer│ │ (subproc, │
│ review │ │ code_review, │ │ own jail) │
└───────┬───────┘ │ summarizer, │ └─────────────┘
│ │ alignment │
│ └───────┬───────────┘
│ │
┌───────▼──────────────────▼────────────┐
│ tools/dispatch │
│ read_file list_dir grep apply_edit │
│ run_verify_command [run_command] │
└───────────────────┬───────────────────┘
│
┌─────────▼─────────┐
│ sandbox/jail.py │
└─────────┬─────────┘
│ JSON policy
┌─────────▼─────────┐
│ agent6-jail │ (Rust; userns + landlock + seccomp)
└───────────────────┘
Dependency direction is enforced by tach:
cli → workflows → agents → tools → sandbox. Workflows never import each
other; agents never import workflows or the CLI.
Requirements
- Linux. macOS and Windows are not supported and never will be — the sandbox uses Linux-only kernel APIs (Landlock, seccomp-bpf, user/mount namespaces, pivot_root).
- Linux kernel ≥ 6.7 for full Landlock TCP-connect rules. Older kernels fall back to filesystem-only Landlock with a loud warning.
kernel.unprivileged_userns_clone = 1(default on Ubuntu, Debian, and most cloud images). Required for thestrictsandbox profile; without it the agent falls back tohardenedor refuses, per config.- Python ≥ 3.12.
- Anthropic and/or OpenAI-compatible API key.
If installing from source, you also need:
- A Rust toolchain on
PATH(cargo,rustc). The hatch build hook invokescargo buildto compileagent6-jail. It does not install Rust for you — ifcargois not onPATHthe hook skips with a message and the resulting install has no jail binary (agent6 check-sandboxwill tell you). - Released PyPI wheels bundle a prebuilt
agent6-jailand have no Rust toolchain requirement.
Install
From source (development):
git clone https://github.com/elesiuta/agent6
cd agent6
uv sync --extra tui
uv run agent6 --help
uv sync runs hatch_build.py, which:
- invokes
cargo build --release --locked --manifest-path jail/Cargo.toml - copies the resulting
agent6-jailintosrc/agent6/sandbox/_bin/agent6-jail(gitignored)
The [tui] extra pulls in textual for the live dashboard. Skip it
with uv sync if you don't want the TUI.
From PyPI:
uv tool install agent6
PyPI wheels ship with the jail binary inside. To override the bundled
binary (custom build, alternate path), set
AGENT6_JAIL_BIN=/path/to/agent6-jail.
Shell tab-completion
agent6 supports tab-completion for all subcommands and flags via argcomplete. To enable it for your current shell, source the completion script once:
# Bash (add to ~/.bashrc to persist).
eval "$(register-python-argcomplete agent6)"
# Zsh (add to ~/.zshrc; needs `autoload -U compinit && compinit` first).
eval "$(register-python-argcomplete agent6)"
# Fish (one-time write).
register-python-argcomplete --shell fish agent6 > ~/.config/fish/completions/agent6.fish
Or, for system-wide completion in all shells, run
activate-global-python-argcomplete once (see argcomplete docs).
Quick start
export ANTHROPIC_API_KEY=sk-ant-...
# Scaffold agent6.toml + AGENTS.md and add .agent6/ to .gitignore.
agent6 init
# Sanity checks.
agent6 check-config
agent6 check-sandbox
# If `check-config` reports missing fields after an upgrade, walk through
# the additions interactively (each one sourced from the starter
# template, with a [y/N] prompt before any write):
agent6 check-config --fix
# Plan-only: cheap pre-flight, no code changes.
agent6 plan new "add a --json output mode to the CLI"
# Inspect a previously persisted plan (defaults to most recent).
agent6 plan show
# Apply free-form feedback to the last plan, producing a new run.
agent6 plan revise "split step 2 into smaller commits"
# Hand-edit the plan JSON in $EDITOR, producing a new run.
agent6 plan edit
# Offline Q&A: if the critic raises clarifying questions, write
# them to a file, fill in the 'answer' fields, then re-run.
agent6 plan new --questions-file q.json "add a --json output mode"
$EDITOR q.json
agent6 plan new --run-id <same-id> --answers-file q.json "add a --json output mode"
# Full implement workflow. --yes auto-confirms the plan.
agent6 run --yes "add a --json output mode to the CLI"
# Resume an interrupted run (refuses if the worktree diverged).
agent6 resume <run-id>
# Read-only code review of a diff. Never touches the worktree.
agent6 review --base origin/main --head HEAD
Other commands:
agent6 watch [<run-id>]— attach the live TUI dashboard to an existing run (defaults to the most recent). Same view thatagent6 runauto-launches in a TTY. Attach and detach freely.agent6 memory— manage persistent agent memory under.agent6/memory/.agent6 history— search transcripts and run data under.agent6/runs/. Subcommands:agent6 history search <query>— ripgrep-backed text search.agent6 history graph [<run-id>]— render the persisted task graph for a run as a DFS-ordered tree (defaults to the most recent run).
agent6 --help— full subcommand list.
Configuration
Every field in agent6.toml is required. No implicit defaults. Start
from agent6.example.toml.
Highlights:
[sandbox]
profile = "auto" # auto | strict | hardened
network = "provider_only" # no | provider_only | allow
run_commands = "ask" # yes | no | ask
[git]
require_clean_worktree = true
auto_stash = false
branch_per_run = true
commit_strategy = "per_step" # per_step | squash | stage | none
allow_push = false # cannot be true; ignored if set
allow_force = false
allow_history_rewrite = false
[workflow]
default = "implement"
verify_command = ["uv", "run", "pytest", "-x"]
[budget]
max_input_tokens = 2000000
max_output_tokens = 200000
Sandbox profiles:
- strict — user/mount/pid/ipc/uts/net namespaces + pivot_root into a
minimal rootfs + Landlock + seccomp + capset(0) + rlimits +
NO_NEW_PRIVS. Requires unprivileged user namespaces. - hardened — no namespaces, but still Landlock + seccomp + capset(0)
- rlimits +
NO_NEW_PRIVS. Works inside default-seccomp Docker.
- rlimits +
- auto — pick
strictif the kernel allows, elsehardened. Logs the chosen profile on every run.
Network modes are enforced by the jail's net-namespace setup (in
strict) or by Landlock's TCP-connect rules (kernel ≥ 6.7) on the agent
process.
Providers and sub-agents
Declare any number of providers as [providers.<name>] blocks. Each
block sets kind = "anthropic" or kind = "openai" and has its own
base_url and api_key_env. OpenAI, OpenRouter, Ollama, vLLM, llama.cpp,
LM Studio etc. coexist under whatever names you pick.
Each sub-agent has a fixed system prompt, a pydantic-typed output schema, and only the tools its workflow gives it. Routing per role:
| Sub-agent | Routed by | Purpose |
|---|---|---|
planner |
[models.planner] |
Decomposes the refined task into ordered steps. |
worker |
[models.worker] |
Executes one plan step using the tool surface. |
critic |
[models.critic] |
Raises open questions; aborts on real ambiguity. |
reviewer |
[models.reviewer] |
Reviews each step's diff; approves or asks for fixes. |
code_review |
[models.reviewer] |
Powers agent6 review: read-only review of an arbitrary diff. |
summarizer |
[models.summarizer] |
Compresses long verify output / file context to fit budgets. |
alignment |
[models.worker] |
Guards agent6 resume: refuses if the worktree diverged. |
planner_revise |
[models.planner] |
Revises a plan in response to a critic or reviewer objection. |
Mixing vendors across roles (e.g. Anthropic planner + OpenRouter worker
- Anthropic reviewer) helps catch shared-failure blind spots.
Workflows
- implement — plan → critic → worker (loop) → reviewer per step.
Verify runs after every step; failures retry within the step budget.
State is persisted to
.agent6/runs/<run-id>/. - plan_mode — produce a frozen plan only. Useful as a cheap "is the plan reasonable?" pre-flight.
- review — drives the
code_reviewsub-agent on a working-tree, branch-vs-base, or<base>..<head>diff. Always read-only.
Tool surface given to the LLM
Fixed and audited in src/agent6/tools/schema.py:
read_file(path, start_line?, end_line?)list_dir(path)grep(pattern, path?, glob?)apply_edit(path, edits: list[{kind: "replace"|"create", old_string?, new_string}])run_verify_command()— runsworkflow.verify_commandonlyrun_command(argv)— only ifsandbox.run_commands ∈ {"yes", "ask"}
There is no write_file, no shell, no web_fetch. Adding a tool
requires a security review note in the commit message.
Cost accounting
Every run prints a per-model token + cost summary at the end:
Token + cost summary:
claude-opus-4-5: in=18054 out=425 cache_r=0 cache_c=0 calls=1 $0.3027
claude-sonnet-4-5: in=8884 out=1171 cache_r=0 cache_c=0 calls=4 $0.0442
TOTAL: in=26938/2000000 out=1596/200000 cost~$0.3469
Pricing lives in src/agent6/budget.py and is
updated by hand from the Anthropic and OpenAI public pages. Budgets in
agent6.toml hard-stop the run; a stopped run is resumable.
Live event log + TUI
Every run writes a structured JSONL event stream to
.agent6/runs/<run-id>/logs.jsonl. The vocabulary is small and stable:
| Event | Emitted by | Notable fields |
|---|---|---|
run.start |
implement workflow | user_task |
plan.ready |
implement workflow | summary, steps[] |
step.start / step.end |
implement workflow | index, title, status, commit_sha |
step.diff |
implement workflow | index, commit_sha, patch (truncated) |
tool.call / .result |
tool dispatcher | name, args (preview), ok, summary |
verify.start / .end |
tool dispatcher | cmd, exit_code, duration_s, *_tail |
role.call / .result |
provider wrapper | role, model, tokens_in, tokens_out |
budget.update |
provider wrapper | totals + caps for input/output tokens |
approval.prompt/.answer |
tool dispatcher | id, prompt, approved, source |
run.end |
implement workflow | all_passed |
The shape is the data contract for any external viewer. The fold from event stream to UI state lives in src/agent6/ui/state.py as a pure function and is intended to be ported 1:1 to TypeScript.
When agent6 is installed with the tui extra and stdout is a TTY,
agent6 run spawns a separate process running
python -m agent6.ui --watch <run-dir> that tails logs.jsonl and
renders a plan tree, budget bar, tool table, log tail, and the latest
step diff. The TUI is read-only on the log; the only thing it writes is
<run-dir>/approvals/<id>.answer when the user clicks Allow/Deny on a
run_command approval modal. Killing the TUI does not affect the
workflow — it falls back to a plain stdin prompt. Pass --no-tui to
disable, or attach later with agent6 watch.
Persistence
Each run writes to .agent6/runs/<run-id>/:
graph.jsonl— append-only journal of every mutation to the task graph.graph.dot— current task graph (regenerated atomically on topology change).nodes/*.md— one markdown file per node; rewritten atomically.logs.jsonl— per-event log (planner output, tool calls, costs).transcripts/— full provider request/response pairs for replay.
A separate agent6-curator subprocess owns all writes to this directory
and runs under its own jail policy allowing writes only to .agent6/.
The main agent process talks to it over a Unix domain socket; no other
process has authority to mutate the graph.
Benchmark
See bench/results.md for an 8-task synthetic
benchmark (bug fix, CLI flag, refactor, type annotations, deprecation
fix, subcommand, logging, extract-method). After fourteen iteration
cycles all eight tasks now PASS three runs in a row at ~$0.45 per
run. The iteration history in bench/results.md documents what broke
at each step and the workflow / prompt fix that landed for it. Re-run
with:
bash bench/run_bench.sh
cat /tmp/agent6-bench/*/result.json
A head-to-head against claude-code on the same task set is tracked
separately under bench/comparison/.
Repository layout
src/agent6/
cli.py argparse entry points
config.py pydantic-strict config (all fields required)
budget.py per-model pricing + per-run accounting
events.py structured run-event log
git_ops.py pure-function git wrappers; refuses push/force/rewrite
memory.py persistent agent memory (read/write/delete)
detect.py kernel + container capability detection
init.py `agent6 init` scaffolding
agents/ typed sub-agent prompts and pydantic IO schemas
workflows/ deterministic Python orchestrators (implement, plan, review)
tools/ dispatcher + schemas for the fixed LLM tool surface
providers/ Anthropic + OpenAI HTTP clients (httpx, no SDK)
sandbox/ jail.py (Python wrapper) + landlock.py (process-side)
graph/ curator subprocess + UDS IPC + on-disk graph store
ui/ pure event-fold + stdlib JSONL tailer + optional textual TUI
jail/ Rust crate for the agent6-jail launcher
tests/
unit/ ~150 unit tests
integration/ crash-resume, curator IPC, plan-mode, alignment
sandbox/ live jail smoke tests + landlock probes
security/ prompt-injection corpus tests
bench/ synthetic benchmark harness + results
.github/workflows/ build, ci, pypi (Trusted Publishing)
Roadmap
Open work, in priority order:
- Harder bench tasks — multi-file refactors, async bugs,
type-error cascades, adversarial / prompt-injected
AGENTS.md. The existing 8-task suite passes consistently; the next iteration needs tasks that exercise more of the failure surface. - Network-egress test corpus — adversarial tests that try to
exfiltrate via DNS, ICMP, IPv6, and unix sockets and assert they are
all blocked under
network = "provider_only". agent6 init --template— write a starterAGENTS.mdfor common stacks (Python lib, Node app, …) instead of just a stub.agent6 check-config --fix— implemented ondev-0.0.3: interactive repair that compares the user's TOML against the starter template, prints would-be additions, and inserts them after a[y/N]prompt.- Headless run mode — non-interactive
--no-confirm-anythingfor CI; today--yesauto-confirms the plan only, not everyrun_commandsprompt. agent6 reviewagainst a remote PR — fetch + review without checkout.
Contributing
Read AGENTS.md before sending a PR. The repo's
verify_command is the single source of truth for "is this PR
landable":
uv run ruff check && uv run ruff format --check && \
uv run pyright && uv run tach check && uv run pytest
Security-sensitive changes (anything in sandbox/, tools/, git_ops,
providers/, graph/curator) must include a security review note in
the commit message.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent6-0.0.4.tar.gz.
File metadata
- Download URL: agent6-0.0.4.tar.gz
- Upload date:
- Size: 148.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
009f909a359b4c6e924fc90e5543131e068664b494fe3bb199efc5787e0aa778
|
|
| MD5 |
e0ef6667e98a47586aa4e2654fe04b94
|
|
| BLAKE2b-256 |
85b452223819b5ffa493d59f009e6fa5fef4fb9f4c4b55cafb3fdeb853d2d21d
|
Provenance
The following attestation bundles were made for agent6-0.0.4.tar.gz:
Publisher:
pypi.yml on elesiuta/agent6
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent6-0.0.4.tar.gz -
Subject digest:
009f909a359b4c6e924fc90e5543131e068664b494fe3bb199efc5787e0aa778 - Sigstore transparency entry: 1589624614
- Sigstore integration time:
-
Permalink:
elesiuta/agent6@2165b0358382e26f466ddb63ea137e31a936e626 -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/elesiuta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@2165b0358382e26f466ddb63ea137e31a936e626 -
Trigger Event:
release
-
Statement type:
File details
Details for the file agent6-0.0.4-py3-none-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: agent6-0.0.4-py3-none-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 435.2 kB
- Tags: Python 3, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd0c9f9266817307cad9c0455868d090a3bfd8dd475b3db35f84586ce2152ffe
|
|
| MD5 |
0ec6bf2eb71d91c4cbad80eb16ca9ba7
|
|
| BLAKE2b-256 |
d2a4e0a07e3d3c21dcbe886600fb32c140c695f78d7b891149f6143166ae49cb
|
Provenance
The following attestation bundles were made for agent6-0.0.4-py3-none-manylinux_2_34_x86_64.whl:
Publisher:
pypi.yml on elesiuta/agent6
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agent6-0.0.4-py3-none-manylinux_2_34_x86_64.whl -
Subject digest:
bd0c9f9266817307cad9c0455868d090a3bfd8dd475b3db35f84586ce2152ffe - Sigstore transparency entry: 1589624658
- Sigstore integration time:
-
Permalink:
elesiuta/agent6@2165b0358382e26f466ddb63ea137e31a936e626 -
Branch / Tag:
refs/tags/v0.0.4 - Owner: https://github.com/elesiuta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@2165b0358382e26f466ddb63ea137e31a936e626 -
Trigger Event:
release
-
Statement type: