Franky - a lean personal coding agent that builds in a hardened container and opens a PR.
Project description
Franky
A lean personal coding agent. Hand it a GitHub issue or a sentence, and it runs a coding agent inside a fresh, hardened Docker container that clones the repo, implements the change, and opens a pull request for you to review.
franky build https://github.com/you/repo/issues/42
franky build jira FOO-123 --repo you/repo
franky build "add a --json flag to the export command" --repo you/repo
franky build "fix the flaky retry test" --repo you/repo --engine claude
# Got review comments or red CI on a Franky PR? Iterate on it with follow-up commits.
franky iterate https://github.com/you/repo/pull/42
The agent is autonomous inside the container. The safety gate is four layers: a hardened container, a default-deny egress allowlist (the container reaches only your provider + GitHub + registries, via a creds-blind proxy), a fail-closed trusted-repo allowlist, and the fact that Franky opens a PR rather than merging - a human still reviews every change.
Why
Most coding-agent wrappers either lock you into one vendor or run the agent straight on your machine with your real credentials and shell. Franky does neither: the engine is pluggable, and the agent only ever runs inside a throwaway container with a narrowly scoped token.
Engines
Franky is vendor-neutral. The engine that runs inside the container is pluggable; all ship in the one image.
| Engine | CLI | Auth | Notes |
|---|---|---|---|
pi (default) |
@earendil-works/pi-coding-agent |
BYOK provider key | MIT, 15+ providers (OpenRouter, Anthropic, OpenAI, Ollama, ...) |
claude |
@anthropic-ai/claude-code |
CLAUDE_CODE_OAUTH_TOKEN |
Most capable; uses your Claude subscription |
codex |
@openai/codex |
CODEX_API_KEY or OPENAI_API_KEY |
OpenAI Codex headless (codex exec); API-key auth only |
Select with --engine pi|claude|codex, or set FRANKY_ENGINE. Resolution order:
--engine flag > FRANKY_ENGINE > default pi.
Install
Franky is published to PyPI as franky-agent (the installed command is franky):
uv tool install franky-agent
# or pipx:
pipx install franky-agent
# or:
pip install franky-agent
On first run the CLI pulls the version-pinned, public GHCR images
(ghcr.io/vietlabs-work/franky:X.Y.Z and ghcr.io/vietlabs-work/franky-proxy:X.Y.Z),
so all you need is Docker - no registry login. (Point FRANKY_GHCR_REPO at a different
namespace if you host the images elsewhere.)
To move to a newer release later, run franky update - it detects how you
installed (uv tool / pipx / pip) and reinstalls the latest version from PyPI via the same
manager. franky update --force reinstalls even when already current. (A dev
checkout updates via git; franky update is a no-op there.)
For local development, skip GHCR and point at local builds:
docker build -t franky .
docker build -t franky-proxy proxy/
export FRANKY_IMAGE=franky
export FRANKY_PROXY_IMAGE=franky-proxy
Franky is agent-agnostic to develop, not just to run: AGENTS.md is the canonical
agent guide (build/test commands, architecture, the load-bearing invariants, how to
add an engine), so Codex, Cursor, pi, or Claude Code all start with the same context.
CLAUDE.md is a symlink to it.
Quickstart
- Install Docker. Images are pulled automatically from GHCR on first run (see Install above). For local dev only, build them manually (see Install above).
- Install Franky:
python3 -m venv .venv && .venv/bin/pip install -e . - Configure credentials with the interactive wizard:
This writesfranky config init~/.franky/config(mode 0600) and walks you through engine selection,FRANKY_ALLOWED_REPOS,GH_TOKEN, and engine creds. You can also set individual keys later:
To inject your own skills / instructions / knowledge into the container, set up an operator profile (see docs/profiles.md):franky config set FRANKY_ALLOWED_REPOS franky config set GH_TOKEN # secret - entered at a hidden prompt franky config list # view the file (secrets masked) franky config path # show where the file lives
At minimum you need:franky profile init # interactive wizard -> ~/.franky/profile.toml franky profile check # dry-run: what would inject + secret scan franky profile show # view the profile + expanded file list franky profile path # show where the profile livesFRANKY_ALLOWED_REPOS- the trusted-repo allowlist (see below).GH_TOKEN- scoped to contents + pull_requests on those repos.- the selected engine's creds (a provider key for
pi,CLAUDE_CODE_OAUTH_TOKENforclaude, orCODEX_API_KEY/OPENAI_API_KEYforcodex). - for JIRA tasks:
JIRA_BASE_URL,JIRA_EMAIL,JIRA_API_TOKEN(host-side only, never forwarded into the container).
- Run:
franky build <gh-issue-url | jira KEY | "prose" | -> [--repo owner/repo] [--engine pi|claude|codex] [--plan-first] [--json] [-q] [-y]
Each run writes a redacted log to tasks/<timestamp>.log and prints the PR URL.
Pass - as the task to read the prose task from stdin. For scripting / agent callers,
see Machine / scripting interface (--json, exit codes).
--plan-first adds an opt-in approval gate for sensitive targets: Franky runs a
read-only planning pass, prints the plan, and waits for explicit confirmation
before it builds or opens a PR. Decline (or run non-interactively) and nothing is
written. The default stays autonomous - the sandbox plus PR review is the gate.
franky build also does a quick (~1s, cached) check for a newer release and
prints a one-line hint if one exists - it never blocks the build. Silence it with
FRANKY_NO_UPDATE_CHECK=1, or set FRANKY_AUTO_UPDATE=1 to auto-install the new
release for your next run. (Both are host-CLI only; neither reaches the container.)
Iterating on a PR
Franky is no longer one-shot. When a PR it opened gets review comments or a red CI check, point it back at the PR and it responds with additive follow-up commits on the same branch:
franky iterate https://github.com/you/repo/pull/42 [--engine pi|claude|codex]
It runs the same hardened, egress-controlled container as franky build, but instead
of starting fresh it checks out the PR's existing branch, reads the review comments and
failing checks with gh (in-container, already allowlisted), addresses them, runs the
tests green, and pushes. It never force-pushes, never rewrites history, never opens a new
PR, and never merges - a human still reviews every change. The PR URL carries the repo, so
there is no --repo flag, and the repo allowlist gates it exactly like build.
Unlike build (which prints the new PR URL to stdout), iterate opens no new PR - on a
clean run it writes only an economics summary and a labeled completion line to stderr, and
nothing to stdout. Review the existing PR for the new commits. The redacted transcript still
lands in tasks/<timestamp>.log.
iterate is intended for Franky's own PRs. As a guardrail it is instructed to confirm
the PR's head branch is a franky/* branch in the same repo (not a fork) before touching
anything, and to stop otherwise. This is a prompt-level guard in the same register as the
"never merge" rule (the agent is autonomous); the hard bounds remain the repo allowlist, the
egress cage, and PR-not-merge. See the Security section.
Planning a big task
One franky run = one focused PR. A run is meant to produce a single, reviewable pull
request, not a sprawling multi-concern changeset. If a task is too big for one PR, split it
first with franky plan:
franky plan "rework auth + add SSO + migrate the user table" --repo you/repo [--json]
franky plan https://github.com/you/repo/issues/42 --json
franky plan - --repo you/repo (read the prose task from stdin)
plan runs one read-only container pass that inspects the repo/issue, decides whether the
task fits one PR or needs splitting, and emits a decomposition. It builds nothing - no
branch, no commits, no PR. It accepts the same task forms as build (issue URL / JIRA key /
prose / - stdin), gates on the same repo allowlist, and threads --engine, --profile, and
--max-duration the same way. The caller orchestrates what to do with the sub-tasks (e.g. run
franky build per sub-task). build --help carries a static pointer to this command.
Under --json, plan emits a distinct envelope (NOT the build/iterate result_schema):
{ "fits_one_pr": false,
"subtasks": [
{"title": "split out SSO", "summary": "add the SSO provider hooks", "suggested_repo": "you/repo"},
{"title": "migrate user table", "summary": "the schema change + backfill", "suggested_repo": "you/repo"}
],
"rationale": "three independent concerns; each is its own reviewable PR",
"engine": "pi", "repo": "you/repo", "exit_code": 0 }
Errors share the same {"error":{...}} envelope and exit-code taxonomy as build/iterate
(a parseable plan that the agent never produced is exit 7, kind: no_plan).
Machine / scripting interface
franky build / iterate / plan are built to be driven by a script or an LLM/agent without
parsing prose. Three guarantees:
1. --json - one machine-readable object on stdout.
Success / agent result:
{ "status": "pr_opened|already_open|no_pr|agent_error|timeout|iterate_complete",
"pr_url": "https://github.com/you/repo/pull/42",
"branch": "franky/issue-42",
"reason": "PR opened",
"exit_code": 0,
"economics": {"tokens_in": 1200, "tokens_out": 340, "cost_usd": 0.0123, "duration_s": 47.5},
"log_path": "tasks/20260625-101500.log",
"engine": "pi",
"repo": "you/repo" }
Failure:
{ "error": {"code": 5, "kind": "auth_error", "message": "...", "hint": "..."} }
--json implies --quiet, so stdout carries exactly one JSON object and nothing else
(progress + the update hint are suppressed). The object is fully redacted - a secret value
never appears, even nested in a field. branch is the predicted branch name
(franky/<slug>) the host computed before the run; the agent may deviate, so treat it as a
hint, not a guarantee (iterate reports null).
2. Exit-code taxonomy (a SemVer contract). The process exit code always equals the
failure's code:
| code | meaning |
|---|---|
| 0 | success (PR opened / iterate pass complete) |
| 2 | usage/flag error; interactive input required in a non-TTY |
| 3 | config error (bad config file, allowlist unset/empty/malformed, bad engine) |
| 4 | allowlist / task rejection |
| 5 | auth/creds missing (GH_TOKEN, engine creds, JIRA creds, JIRA 401/403) |
| 6 | docker / image unavailable |
| 7 | agent ran but exited nonzero or produced no PR |
| 8 | network/timeout (JIRA reach/HTTP/parse) |
| 9 | run exceeded --max-duration (the container was aborted) |
Note build exits 7 (not 0) when the agent finishes cleanly but opens no PR, so success
is distinguishable from a no-PR outcome by exit code alone. The taxonomy is append-only -
new codes may be added, but existing values never change meaning.
3. Never-hang. Every interactive prompt fails fast with exit 2 in a non-TTY instead of
blocking forever:
--plan-firstwithout a TTY needs--yesto auto-approve, else it exits 2 before running the planning pass.franky build -(andfranky plan -) reads the task from stdin; on an interactive TTY (no piped input) it exits 2 rather than block waiting for a human to type the task - pipe the task in instead.franky config set <KEY>(no value) andfranky config initexit 2 without a TTY rather than waiting on a prompt.
Other flags: -q/--quiet suppresses progress and the update hint (stdout stays exactly the
bare PR URL); -y/--yes auto-approves --plan-first. Without --json, errors print a single
franky: <message> line to stderr and use the same exit code.
4. Budget guardrail. --max-duration SECONDS (on build, iterate, and plan) aborts a
runaway run; the container is killed and the result is status: timeout / exit 9 (plan
raises the timeout error). The default budget is 1800s. (A token/cost cap is out of scope -
token usage is only known after the run.)
5. Idempotency (retry-safety). Before launching, build computes a deterministic branch
(franky/issue-42, franky/<jira-key>, or franky/<prose-slug>) and asks GitHub whether an
open Franky PR already uses it. If so it reports status: already_open with the existing
pr_url at exit 0 and does not open a duplicate - so an agent that retries the same task
converges instead of stacking PRs. The check is best-effort (any error just proceeds with the
build) and --force skips it.
6. franky schema. Prints one JSON object describing every command + its flags, the
result/error object shapes (including the distinct plan_result_schema), and the exit-code
table - machine introspection so an agent can discover the contract instead of parsing
--help.
Security
Read this before pointing Franky at anything.
Container hardening is load-bearing. Because the agent runs autonomously
(claude with --dangerously-skip-permissions, codex with
--dangerously-bypass-approvals-and-sandbox, pi with its default tools), the
OS-level isolation is what bounds it, not tool-permission prompts. Franky runs the
container with:
--cap-drop=ALL, then adds back onlyCAP_SETUID/CAP_SETGID(needed by the rootless Docker daemon - see "Docker-in-Docker" below)--read-onlyroot filesystem; writable paths only via--tmpfs(the clone, the agent's HOME, and the rootless Docker data root, each owned by the non-root uid)--pids-limitand--memorycaps (with--memory-swap=--memory, no swap)- a non-root user (uid 1001) baked into the image
- no Docker socket mount and no host bind mounts - the repo is cloned inside the container and the nested Docker daemon is rootless, so the agent never touches your filesystem or your host's Docker daemon
- only the selected engine's required env vars passed in; nothing else
Docker-in-Docker (always on)
Many repos cannot run their test suite without Docker (compose-based integration
tests, testcontainers, a docker build step). So every Franky container runs its
own rootless Docker daemon - the agent can docker build, docker compose up
test infra, and run testcontainers entirely inside the sandbox. Nothing to enable;
it is always available.
This is rootless DinD (a daemon running as the non-root franky user inside its
own user namespace), not a mounted host Docker socket and not --privileged.
It needs a few specific, minimal relaxations of the locked profile, applied to every
task and verified on Docker Desktop for Mac:
--security-opt=no-new-privilegesis dropped (it blocks the setuid uid-map helpers rootless Docker needs to start),--security-opt=systempaths=unconfined(unmasks/procso the nested runtime can mount it for inner containers - far narrower than--privileged/seccomp=unconfined),CAP_SETUID/CAP_SETGIDadded back on top of--cap-drop=ALL, and/dev/net/tunfor the rootless network stack.
The blast radius stays bounded by everything else (rootless user namespace, read-only
root, the egress cage below, no host FS, repo allowlist, PR-not-merge). The nested
daemon's image pulls and docker build fetches go through the same egress proxy
(it inherits HTTP(S)_PROXY), and inner containers have no route to the internet
except that proxy - verified: an off-allowlist docker build FROM or RUN fetch is
refused by the proxy, and a nested container's direct egress has no route out.
Egress control
The big v0 hole - a prompt-injected agent exfiltrating the creds it carries -
is now closed by a default-deny egress allowlist. The task container runs on a
Docker --internal network with NO route to the internet; its only peer is a
Squid proxy enforcing a domain allowlist.
Docker --internal network (no internet route)
+-----------------------------------------------------------------+
| |
| [ task container ] --HTTP(S)_PROXY--> [ franky-proxy (Squid) ]-+--> allowlisted
| --dns 127.0.0.1 default-deny allowlist | hosts only
| (no creds on argv) (sees NO creds) |
+-----------------------------------------------------------------+
- Blind CONNECT, no creds at the proxy. Egress is HTTPS-only (port 443): Squid tunnels it with a blind CONNECT (no TLS termination), so it never sees the bytes - your Claude token or BYOK key tunnel through encrypted and are never visible to the proxy. Plain HTTP (port 80) is denied outright, so there is no cleartext, proxy-visible path even to an allowlisted host.
- DNS is killed in the task container (
--dns 127.0.0.1), so a hostile agent cannot resolve or reach an off-allowlist host directly; only the proxy resolves. - Fail-closed. Franky refuses to start the task unless the proxy is confirmed healthy, and the proxy refuses to start with an empty or malformed allowlist.
- The allowlist covers: your engine's provider host (e.g.
api.anthropic.com,openrouter.ai,api.openai.com), GitHub (clone/push/PR), the npm + PyPI registries, and - because Docker-in-Docker is always on - a broad set of well-known container image registries (Docker Hub + CDN, GHCR, GCR/Artifact Registry,registry.k8s.io, Quay, ECR Public, MCR, GitLab, plus the CDNs they serve layer blobs from). Add extra hosts withFRANKY_EXTRA_ALLOWED_DOMAINS(comma-separated).
Residual risk. The allowlisted hosts are high-trust, but the agent can still reach GitHub, your model provider, the package registries, and the container registries above - so a determined injection could still smuggle data to one of those (e.g. a gist, an issue comment). Treat allowlisted destinations as trusted, not inert. Two consequences of always-on DinD specifically:
- Wider reachable set + a relaxed profile on every task (incl. non-Docker ones):
the registry allowlist is broad (notably
.cloudfront.net, a shared CDN), and the hardening relaxations above apply universally. This is a deliberate trade for "building/testing just works". - The agent can move its own creds into nested containers (e.g.
docker run -e GH_TOKEN ...). The egress allowlist still bounds where anything can go and PR-not-merge still bounds the damage, but the secret is no longer confined to a single process. There is also no per-inner-container resource limit and no cross-task concurrency cap - the outer--memory/--pidscap (~8 GB, tmpfs image storage is RAM) bounds one task's whole container tree.
v0 mitigations, still in force:
-
Fail-closed trusted-repo allowlist. Franky refuses any repo not in
FRANKY_ALLOWED_REPOS, and refuses everything if that var is unset. This limits injection to content you already trust.The allowlist supports per-segment glob patterns (case-insensitive):
my-org/my-repo- exact matchmy-org/*- every repo inmy-orgmy-org/team-*- repos with a name prefix*- every repo theGH_TOKENcan reach (its full scope - a conscious opt-in, not the default; use only if the token is already narrowly scoped)
-
Scope your tokens narrowly. Give
GH_TOKENonly contents + pull_requests on the target repos. Prefer a low-spend or separate API key forpi. -
PR, not merge. Franky only opens PRs. You review before anything lands.
franky iteratefollows the same rule: it only pushes additive commits to an existing PR's branch (never force-push, never merge, never a new PR), and the "act only on afranky/*branch in the same repo" check is prompt-level - so pointiterateonly at PRs Franky itself opened, in an allowlisted repo.
GitHub Actions warning. Opening a PR can trigger workflows. A PR built from an attacker-influenced issue could run attacker-influenced workflow code with your repo's Actions secrets. Review workflow changes in the PR diff, and consider requiring approval for workflow runs on PRs.
Evals
Agent quality is probabilistic, so changes to the persona, prompt, model, or profile should be gated on a measured pass-rate, not a hunch. The eval harness runs a golden task set through the real Franky flow N times and reports pass-rate, plus a comparison mode that reports the delta between two configs (e.g. one engine vs another).
It is opt-in and out-of-band (like the manual egress check) - it needs real Docker +
creds + a throwaway sandbox repo, so it is not part of the fast hermetic unit suite. Point
evals/tasks.json at your sandbox repo and run:
make eval ARGS="-n 3 --engine pi --compare-engine codex"
See evals/README.md for setup, the task schema, and the success
checkers.
Status
v0.0.5. Real end-to-end runs need live engine credentials, supplied out-of-band by the operator. The pieces under test here are the container hardening, the egress allowlist + proxy orchestration, the secret redaction, the trusted-repo allowlist, and the engine abstraction.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file franky_agent-0.0.5.tar.gz.
File metadata
- Download URL: franky_agent-0.0.5.tar.gz
- Upload date:
- Size: 144.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e861c1fc69a9a525308ecea2917d39be5e7200feabbbb8698c4594b1cdc9039
|
|
| MD5 |
cfcec7d0d9cd013fc64a4a54333baf3f
|
|
| BLAKE2b-256 |
16f2c037312285968e4b7911c8c622e4fdebbc0d7256716218bc978287f8f3a2
|
Provenance
The following attestation bundles were made for franky_agent-0.0.5.tar.gz:
Publisher:
release.yml on vietlabs-work/franky
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
franky_agent-0.0.5.tar.gz -
Subject digest:
9e861c1fc69a9a525308ecea2917d39be5e7200feabbbb8698c4594b1cdc9039 - Sigstore transparency entry: 2059373176
- Sigstore integration time:
-
Permalink:
vietlabs-work/franky@8ec7a31f90b4b5b25fbe81c8971c042fb7f1384c -
Branch / Tag:
refs/tags/v0.0.5 - Owner: https://github.com/vietlabs-work
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8ec7a31f90b4b5b25fbe81c8971c042fb7f1384c -
Trigger Event:
push
-
Statement type:
File details
Details for the file franky_agent-0.0.5-py3-none-any.whl.
File metadata
- Download URL: franky_agent-0.0.5-py3-none-any.whl
- Upload date:
- Size: 88.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f45f8892225fcf9f49026e9c14aaab1dbd29bcec87b6136a22a9a7e69b66cfc6
|
|
| MD5 |
91bdb1877e65034dbfe3a5277e6380a0
|
|
| BLAKE2b-256 |
5c33ccd0921b3c5f542400fcb2a10bc2caedd8ac5cc58fa93fbd3dd6851fd8a5
|
Provenance
The following attestation bundles were made for franky_agent-0.0.5-py3-none-any.whl:
Publisher:
release.yml on vietlabs-work/franky
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
franky_agent-0.0.5-py3-none-any.whl -
Subject digest:
f45f8892225fcf9f49026e9c14aaab1dbd29bcec87b6136a22a9a7e69b66cfc6 - Sigstore transparency entry: 2059373692
- Sigstore integration time:
-
Permalink:
vietlabs-work/franky@8ec7a31f90b4b5b25fbe81c8971c042fb7f1384c -
Branch / Tag:
refs/tags/v0.0.5 - Owner: https://github.com/vietlabs-work
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8ec7a31f90b4b5b25fbe81c8971c042fb7f1384c -
Trigger Event:
push
-
Statement type: