Autonomous gated software sprints. Define your team in YAML โ dev, bug-hunter, QA. Gate agent issues GO/CAUTION/REVERT. Auto-deploy on GO, auto-revert on REVERT. Self-learning wiki across cycles. Works with Claude Code, Aider, Codex, or any LLM CLI. Shall we play a game?
Project description
joshua-agent
"Shall we play a game?"
Demo
https://github.com/jorgevazquez-vagojo/joshua-agent/assets/demo_en.mp4
Autonomous gated software sprints. Define your team in YAML. Agents work in cycles โ dev, bug-hunter, QA. The gate decides: deploys. rolls back. You come back to a log.
One day, teams will stop babysitting AI. Instead of prompting one agent at a time โ copy, paste, check, repeat โ they'll define a team in a YAML file and walk away. The agents run in cycles: execute tasks, review each other, deploy or roll back, extract lessons, sleep, repeat. You come back to a log of what happened and (hopefully) better output than yesterday. โ @jorgevazquez, April 2026
Built for software delivery. Each cycle: work agents write and fix code, a gate agent reviews and issues a verdict. โ deploy. โ continue carefully. โ roll back via git. Lessons accumulate, a wiki builds itself, future prompts get sharper. Extensible to any domain โ document review, compliance, financial analysis โ once you understand the core loop.
Named after the AI in WarGames that learned the only winning move is to keep playing.
Work Skills Gate Skills
+--------------+ +----------+
| Dev | | |
| Bug Hunter |--------->| QA |--> Deploy (or Revert)
| CFO | | Review |
| Any Skill... | +----------+
+--------------+ |
^ |
+---- next cycle -------+
How it works
joshua-agent has three core concepts:
- Skills โ a skill is any professional role you can describe in a prompt.
dev,qa,bug-hunter,security,cfo,legal-analyst,compliance,pm,tech-writer, or literally anything else. Built-in skills are just prompt templates. You can define your own withsystem_prompt:in YAML โ if you can brief a human, you can brief an agent. - Phases โ agents are either
work(execute tasks) orgate(review and judge). Work agents produce output. Gate agents read that output and return a verdict:GO(ship it),CAUTION(ship but flag), orREVERT(roll back). This separation exists because unsupervised AI output is dangerous. The gate is a circuit breaker. - Cycles โ agents don't run once. They cycle. Each cycle picks the next task (round-robin), runs all work agents, feeds the output to gate agents, acts on the verdict, extracts lessons, and sleeps. Then does it again. This is how real teams work โ continuous improvement, not heroic one-off efforts.
The runner abstraction means joshua-agent doesn't care what LLM you use. Claude Code, OpenAI Codex, Aider, or any CLI tool. Swap it in the YAML and everything else stays the same.
Metrics & Evaluation
Each sprint cycle, joshua tracks:
- Cycle number โ sequential counter since the sprint started
- Agent durations โ wall-clock seconds each agent took to run
- Gate verdict โ
GO,CAUTION, orREVERTfor the cycle - Consecutive errors โ how many cycles in a row ended in failure or error
- Gate findings โ the raw text the gate agent returned, injected into the next cycle
Results are stored in the .joshua/ directory alongside your project:
.joshua/
โโโ checkpoint.json Current cycle number, last verdict, error counts
โโโ lessons/ One file per cycle โ raw lessons extracted from agent output
โโโ wiki/ Curated knowledge entries built from accumulated lessons
To measure progress across cycles, use the status command:
joshua status .joshua
This shows cycle history, verdict distribution, and per-agent timing. Compare cycle 1 vs cycle N to see whether the gate is issuing fewer REVERTs and agents are completing tasks faster.
To evolve agent prompts using accumulated lessons:
joshua evolve config.yaml
joshua evolve curates raw lessons into wiki entries and can rewrite agent prompts to incorporate what was learned.
Honest note: There is no public benchmark dataset for joshua-agent. What you can track concretely on your own project: GO/REVERT ratio over time, cycle-over-cycle agent duration, and gate finding patterns. Use joshua status to build your own baseline.
Quick start
pip install joshua-agent
Example 1 โ Development sprint. Three agents write code, hunt bugs, and review. QA issues verdicts. Good code gets deployed.
# dev-sprint.yaml
project:
name: my-app
path: ~/my-app
deploy: "npm run build && npm start"
agents:
dev:
skill: dev
tasks:
- "Review code quality and suggest improvements"
- "Refactor for maintainability"
bug-hunter:
skill: bug-hunter
tasks:
- "Scan for uncaught exceptions and error handling gaps"
qa:
skill: qa
sprint:
cycle_sleep: 300
Example 2 โ Executive sprint. No code. No deploy command. Agents analyze documents, audit costs, and check compliance. Same framework, different skills.
# executive.yaml
project:
name: acme-corp
path: ~/acme-corp-docs
agents:
cfo:
skill: cfo
system_prompt: |
You are {agent_name}, CFO for {project_name}.
Analyze financial documents in {project_dir}.
tasks:
- "Audit vendor contracts expiring within 90 days"
- "Analyze monthly burn rate from financial reports"
compliance:
skill: compliance
phase: gate
verdict_format: true
system_prompt: |
You are {agent_name}, Compliance Director.
Review all analysis for regulatory compliance.
sprint:
cycle_sleep: 600
gate_blocking: true
joshua run dev-sprint.yaml # Software sprint
joshua run executive.yaml # Business analysis sprint
Agents work, gate reviews, act on verdict. Repeat. Any domain, any role.
What it looks like
============================================================
CYCLE 1 โ 2026-04-05T03:14:00
============================================================
[cfo] (cfo) Task: Audit vendor contracts expiring within 90 days
[cfo] OK (189.3s, 3841 chars)
[compliance] (compliance) Reviewing cycle 1...
[compliance] OK (94.2s, 1102 chars)
VERDICT: GO
CYCLE 1 COMPLETE โ verdict=GO
Sleeping 600s before next cycle...
Design choices
Skills, not roles. Every agent is a skill defined in YAML. Built-in skills (dev, qa, bug-hunter, security, perf, pm, tech-writer) are convenient starting points โ just prompt templates with sensible defaults. But the real power is custom skills: a CFO that audits costs, a legal analyst that reviews contracts, a compliance officer that checks governance, a COO that maps operational bottlenecks. No deploy command needed. No code required. joshua-agent is not a coding tool that happens to support other things. It's a framework for autonomous professional work that happens to be good at code too.
Two phases: work and gate. Work agents do the job. Gate agents judge it. This is the single most important design decision in the framework. Without a gate, you're just running unsupervised AI and hoping for the best. The gate is a circuit breaker โ REVERT means nothing ships. In production, we've seen gate agents catch issues that would have broken deployments, flagged non-compliant analysis, and prevented cascading errors. The two-phase model also means you can scale work agents independently of review capacity.
Continuous cycles, not one-shot. Most agent frameworks run once and stop. joshua-agent cycles. Each cycle picks the next task from a round-robin queue, so a dev agent with 10 tasks will work through all of them across 10 cycles. After each cycle, agents extract lessons from their output. What worked, what broke, what patterns to follow or avoid. These lessons accumulate and get injected into future prompts. The agents literally get better over time. We've observed measurable improvement in output quality between cycle 1 and cycle 10 on the same project.
Self-learning via wiki (Karpa pattern). Raw agent output from every cycle gets saved. Periodically, the LLM curates this raw output into structured knowledge entries โ a wiki that builds itself. Entries get deduplicated, lint-checked for contradictions, and fed back to agents as context. You never write the wiki. The LLM writes everything. You just steer โ every answer compounds into institutional knowledge.
LLM-agnostic. joshua-agent talks to CLI tools, not APIs. Claude Code, OpenAI Codex, Aider, or any custom command that accepts a prompt and returns text. The runner is a one-method interface: run(prompt, cwd, system_prompt, timeout) -> RunResult. Swap it in YAML, everything else stays the same. This means you can use different models for different agents โ Opus for the gate, Sonnet for work agents, a local model for experiments.
Gate blocking. When a gate says REVERT, you probably don't want work agents piling more changes on top. gate_blocking: true freezes work agents on the next cycle. Only agents marked run_when_blocked: true (like bug hunters and security scanners) will run. This prevents compounding failures โ the bug hunter fixes what the gate flagged, the gate reviews the fix, and only then does normal work resume.
Cross-agent context. Gate findings from the previous cycle get injected into work agents' prompts via {gate_findings}. The QA agent tells the dev agent what's wrong. The dev agent fixes it next cycle. They talk through the framework โ no manual copy-paste, no context loss between runs.
Resource-aware scheduling. Each LLM agent consumes significant memory. Running multiple sprints on the same machine can trigger OOM kills (we learned this the hard way). min_memory_gb checks available RAM before each agent run โ if memory is low, joshua-agent waits instead of crashing. agent_stagger adds a fixed delay between agent executions to let the system breathe. Together, they let you safely run multiple sprints on a single server.
Supported runners
| Runner | Command | Install |
|---|---|---|
| Claude Code | claude |
npm i -g @anthropic-ai/claude-code |
| OpenAI Codex | codex |
npm i -g @openai/codex |
| Aider | aider |
pip install aider-chat |
| Custom | any CLI | command: "my-tool --input {prompt_file} --dir {cwd}" |
Full config reference
project:
name: my-project
path: ~/my-project # Any folder โ code, docs, reports, data
deploy: "bash deploy.sh" # Optional โ omit for non-code sprints
health_url: http://localhost:3000/health # Optional
runner:
type: claude # claude | codex | aider | custom
timeout: 1800 # Max seconds per agent run
model: sonnet # Model override (optional)
agents:
dev:
name: lightman # Custom name (optional)
skill: dev # Built-in or custom skill
max_changes: 5 # Max changes per cycle
run_when_blocked: false # Run even when gate is blocked
tasks:
- "Task 1"
- "Task 2" # Round-robin through list
qa:
skill: qa # Gate skills auto-detect verdict format
cfo:
skill: cfo
system_prompt: | # Any prompt you want
You are {agent_name}, a CFO reviewing {project_name}.
Analyze costs, licensing, and resource usage.
tasks:
- "Audit third-party dependency costs"
sprint:
cycle_sleep: 300 # Seconds between cycles
max_cycles: 0 # 0 = infinite
max_hours: 96 # 0 = infinite
digest_every: 12 # Summary report every N cycles
retries: 2 # Retry failed agent runs
revert_sleep: 600 # Longer sleep after REVERT
max_consecutive_errors: 5 # Stop after N errors in a row
gate_blocking: true # REVERT blocks work agents
cross_agent_context: true # Gate findings -> work agents
health_check: true # Check health_url each cycle
recovery_deploy: "bash rollback.sh"
git_strategy: snapshot # none | snapshot
agent_stagger: 30 # Seconds to wait between agent runs
min_memory_gb: 4 # Wait for free RAM before each agent
preflight:
min_disk_gb: 5 # Check disk before each cycle
min_memory_gb: 4 # Check RAM before each cycle
memory_wait_timeout: 120 # Seconds to wait if memory is low
docker_cleanup: true # Auto-clean Docker on low disk
notifications:
type: telegram # telegram | slack | webhook | none
token: ${TELEGRAM_TOKEN}
chat_id: ${TELEGRAM_CHAT_ID}
tracker:
type: jira # jira | github | filesystem | none
base_url: https://x.atlassian.net
project_key: PROJ
memory:
enabled: true
state_dir: .joshua
Template variables for prompts: {agent_name}, {skill}, {project_name}, {project_dir}, {deploy_command}, {memory}, {wiki}, {gate_findings}, {max_changes}.
CLI
joshua run config.yaml # Run a sprint
joshua run config.yaml -n 10 # Max 10 cycles
joshua run config.yaml -H 96 # Max 96 hours
joshua run config.yaml --dry-run # Validate config only
joshua status .joshua # Status dashboard
joshua evolve config.yaml # Run evolution + wiki maintenance
Deploy safety: The
deploy_commandin your config runs as a shell command with your user's permissions. Use dry-run mode (joshua run config.yaml --dry-run) to validate config before running. Never use untrusted YAML configs.
Examples
See examples/ for ready-to-use configs:
Business & governance:
executive-team.yamlโ CFO + COO + Compliance Directorlegal-review.yamlโ Legal Analyst + Risk Assessor + General Counsel
Software development:
minimal.yamlโ 3 agents, zero configfull-team.yamlโ Dev, Bug Hunter, Security, Perf, PM, QAwordpress.yamlโ WordPress: WCAG, SEO, PHP auditsnextjs.yamlโ Next.js: TypeScript, React, API auditspython-api.yamlโ FastAPI/Django: testing, security, DB audits
Use Cases
Three ready-to-run packs for common scenarios:
Pack 1: Legacy Modernization
Agents: dev (modernize code), bug-hunter (find regressions), qa (gate review).
Each cycle improves one area of a legacy codebase. The gate blocks the next cycle if tests break or regressions appear, so changes accumulate safely. Example: examples/python-api.yaml.
Pack 2: Continuous Release Gate
Agents: dev (implement feature or fix), qa (quality gate with GO/CAUTION/REVERT).
Runs your CI-equivalent autonomously โ auto-deploys on GO, reverts on REVERT, sleeps and repeats. Drop-in replacement for a human code reviewer on low-risk branches. Example: examples/minimal.yaml.
Pack 3: Document & Compliance Review
Agents: analyst (review documents), legal (compliance check), executive (summary + gate).
Multi-agent review cycle for contracts, policies, or technical specs. No deploy command needed โ the gate verdict determines whether the document passes or requires revision. Example: examples/legal-review.yaml.
Architecture
joshua/
โโโ cli.py CLI entry point
โโโ config.py YAML loader + ${ENV} interpolation
โโโ sprint.py The loop (work โ gate โ deploy/revert โ learn โ sleep โ repeat)
โโโ agents.py Skill definitions + prompt templates
โโโ runners/
โ โโโ base.py LLMRunner interface
โ โโโ claude.py Claude Code
โ โโโ codex.py OpenAI Codex
โ โโโ aider.py Aider
โ โโโ custom.py Any CLI tool
โโโ memory/
โ โโโ lessons.py Extract lessons from each cycle
โ โโโ wiki.py Karpa pattern knowledge base
โ โโโ evolve.py Daily evolution + lint
โโโ integrations/
โ โโโ git.py Snapshot, merge, revert
โ โโโ notifications.py Telegram, Slack, webhook
โ โโโ trackers.py Jira, GitHub Issues, filesystem
โโโ utils/
โโโ health.py HTTP health checks
โโโ preflight.py Disk, memory, Docker cleanup
โโโ status.py Dashboard
Contributing
Areas where help is needed:
- Runners: Cursor, Windsurf, VS Code Copilot
- Trackers: Linear, Notion, Trello
- Notifiers: Discord, email, PagerDuty
- Skills: share your custom skill templates
License
MIT. See LICENSE.
Built by Jorge Vazquez. The only winning move is to keep playing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file joshua_agent-0.5.0.tar.gz.
File metadata
- Download URL: joshua_agent-0.5.0.tar.gz
- Upload date:
- Size: 71.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a42cd5d234a77679f35f30c6a324571cb5252b988e29cc9265c02e7fc36470
|
|
| MD5 |
659d909532629eb9587aa640f3a4670a
|
|
| BLAKE2b-256 |
60c5b07667ca20c85ebe7256cb51270382f60fb89255e5a522521611c98ee759
|
Provenance
The following attestation bundles were made for joshua_agent-0.5.0.tar.gz:
Publisher:
publish.yml on jorgevazquez-vagojo/joshua-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
joshua_agent-0.5.0.tar.gz -
Subject digest:
d4a42cd5d234a77679f35f30c6a324571cb5252b988e29cc9265c02e7fc36470 - Sigstore transparency entry: 1242870561
- Sigstore integration time:
-
Permalink:
jorgevazquez-vagojo/joshua-agent@d7a56f492992cbe41fc7c5302ffe1ad3f6fa79ed -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/jorgevazquez-vagojo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d7a56f492992cbe41fc7c5302ffe1ad3f6fa79ed -
Trigger Event:
release
-
Statement type:
File details
Details for the file joshua_agent-0.5.0-py3-none-any.whl.
File metadata
- Download URL: joshua_agent-0.5.0-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b4c8a417a2949c1cffd816a5ee1ef2892a9229bd1802e7417dc4e7c90f476a7
|
|
| MD5 |
ee232255710c487818d43ed1beef43a5
|
|
| BLAKE2b-256 |
4206792a4d302f226374e6844820c78fcb9e7c66324bc6e9ff1eea890b0d8cfc
|
Provenance
The following attestation bundles were made for joshua_agent-0.5.0-py3-none-any.whl:
Publisher:
publish.yml on jorgevazquez-vagojo/joshua-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
joshua_agent-0.5.0-py3-none-any.whl -
Subject digest:
7b4c8a417a2949c1cffd816a5ee1ef2892a9229bd1802e7417dc4e7c90f476a7 - Sigstore transparency entry: 1242870601
- Sigstore integration time:
-
Permalink:
jorgevazquez-vagojo/joshua-agent@d7a56f492992cbe41fc7c5302ffe1ad3f6fa79ed -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/jorgevazquez-vagojo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d7a56f492992cbe41fc7c5302ffe1ad3f6fa79ed -
Trigger Event:
release
-
Statement type: