AI multi-agent framework — five agents that read your codebase, write real code files, and run your test suite
Project description
NexusForge
An AI multi-agent framework that runs a full software development team on your machine. Five specialized agents — Planner, Developer, Reviewer, QA, and Memory — read your existing codebase, write real code files, run your actual test suite, and coordinate through an asyncio orchestrator. State is stored in plain YAML. No database. No server. No cloud dependency.
What Is NexusForge?
NexusForge turns a backlog of tasks into committed, reviewed, tested code by running five specialized AI agents in a loop. Unlike AI assistants that generate code snippets in a chat window, NexusForge agents read your existing codebase, write files directly to disk, and run your real test suite to validate their work.
| Agent | What it actually does |
|---|---|
| Planner | Scans the project file tree, reads README and key files, selects relevant existing source files, queries past lessons, then writes a concrete plan naming real files to create or modify |
| Developer | Reads the same project context, generates implementation + tests in a structured format, writes files to disk, saves a manifest of what changed |
| QA | Reads the written files; either runs test_command (your real test suite) or asks the LLM to validate code quality — pass/fail gates the review transition |
| Reviewer | Reads the actual written files from the manifest, reviews real code, writes .nexus/reviews/<task>.md with files reviewed listed |
| Memory | Logs every decision and reflection; surfaces past lessons to Planner before each planning cycle |
You create tasks (or let nexus plan do it from a requirements file). Agents do the work. You approve features and deliveries. State survives restarts and crashes because every write is atomic.
How It Is Different
| Capability | NexusForge | AI coding assistants | Multi-agent platforms (AutoGen, CrewAI) |
|---|---|---|---|
| Writes real code to disk | Yes — files created/modified on disk | Chat only | No (orchestration only) |
| Reads existing codebase | Yes — scans tree, reads relevant files | Per-session only | No |
| Runs real test suite | Yes — configurable test_command |
No | No |
| State persistence | YAML files — survive crash and restart | Session memory | Usually in-memory or DB |
| Human approval gates | Built-in (nexus approve) |
Ad-hoc | Optional |
| Audit trail | Every transition, decision, reflection logged | None | Varies |
| Crash recovery | Kill-9 tested; atomic writes, no partial state | None | None |
| Provider agnostic | OpenAI, Anthropic, any Ollama local model | Usually one vendor | Varies |
| Fully offline | Yes, with Ollama | No | No |
| CLI-first | Full nexus CLI, scriptable |
GUI / IDE plugin | Python API |
The core differentiator: NexusForge treats software development as a stateful, recoverable process with real file I/O — not a chat conversation. Code lands on disk, tests run against real files, and the entire history is auditable in YAML.
Requirements
- Python 3.11 or 3.12
- An API key for at least one LLM provider, or a running Ollama instance for fully local operation
Installation
Option A — pip (standard)
# Core package — add the provider you use
pip install nexusforge-ai # core only
pip install "nexusforge-ai[anthropic]" # + Anthropic Claude
pip install "nexusforge-ai[openai]" # + OpenAI GPT-4o and variants
pip install "nexusforge-ai[local]" # + Ollama (any local model)
pip install "nexusforge-ai[all]" # all three providers
Option B — pipx (isolated, nexus available globally)
pipx install "nexusforge-ai[anthropic]"
# nexus is now on your PATH in an isolated environment
Option C — uv (fastest, project-level)
uv add "nexusforge-ai[anthropic]"
# or globally:
uv tool install "nexusforge-ai[anthropic]"
From source (contributors)
git clone https://github.com/your-org/nexusforge
cd NexusForge
uv sync --all-extras
# Run via: uv run nexus <command>
Verify
nexus version
# nexusforge 0.1.0
Getting Started
All examples below assume
nexusis on your PATH viapip install nexusforge-aiorpipx install nexusforge-ai. If you installed viauv run, prefix every command withuv run.
1. Initialize inside your project directory
cd my-project # your existing or new project
nexus init
NexusForge creates .nexus/ alongside your existing code. It does not touch your project files during init.
nexus doctor
NexusForge Doctor
┌──────────────────────┬────────┬──────────────────────┐
│ Check │ Status │ Detail │
├──────────────────────┼────────┼──────────────────────┤
│ .nexus/ directory │ OK │ │
│ config.yaml │ OK │ phase=1, provider=.. │
│ API key │ OK │ ANTHROPIC_API_KEY set │
│ Provider probe │ OK │ claude-opus-4-7 │
└──────────────────────┴────────┴──────────────────────┘
2. Configure your test command
Edit .nexus/config.yaml to point at your real test suite:
# Set to your project's test command. Empty = LLM-simulated QA.
test_command: "uv run pytest -q" # Python/uv
# test_command: "npm test" # Node.js
# test_command: "cargo test" # Rust
# test_command: "" # no real tests yet
This is the most important config value. When set, QA runs the real tests after each Developer delivery and uses the exit code to gate the review → done transition.
3. Create your plan from requirements
cat > REQUIREMENTS.md << 'EOF'
# Auth System
Users need to log in and register. Passwords must be hashed.
Sessions expire after 30 minutes of inactivity.
# User Profile
Users can update their display name and avatar.
Profile changes must be audited.
EOF
nexus plan REQUIREMENTS.md
Output:
Decomposing REQUIREMENTS.md with claude-opus-4-7...
Plan: 2 features, 5 tasks
▶ Auth System
Login, registration and session management
• Implement JWT login and registration endpoints
✓ Returns 200 with token on valid credentials
✓ Rejects duplicate emails with 409
• Add password hashing with bcrypt
✓ Passwords never stored in plain text
• Implement 30-minute session expiry
✓ Sessions auto-extend on request; expire after 30 min idle
▶ User Profile
Profile editing with audit trail
• Add profile update endpoint with validation
✓ Display name and avatar URL validated
• Implement profile change audit log
✓ Every change recorded with timestamp and actor
Created 2 features and 5 tasks. Run nexus start to begin.
Preview without writing: nexus plan REQUIREMENTS.md --dry-run
4. Start the orchestrator
nexus start
# Orchestrator started (PID 12345)
Agents work concurrently — up to 3 tasks per agent simultaneously. Watch what happens in another terminal:
nexus logs --follow
[planner.log] Planner received task NF-1
[planner.log] Scanned 47 project files, reading 5 relevant files
[planner.log] LLM for NF-1: Create src/auth/login.py, src/auth/models.py, tests/test_login.py
[developer.log] Developer received task NF-1
[developer.log] Reading 8 context files from existing codebase
[developer.log] Developer wrote 3 file(s) for NF-1: src/auth/login.py, src/auth/models.py, tests/test_login.py
[qa.log] QA running 'uv run pytest -q' for NF-1
[qa.log] QA test run for NF-1: PASS (3 tests, 0 failures)
[reviewer.log] Reviewer reading 3 files for NF-1
[reviewer.log] Reviewer LLM for NF-1: approved
5. Review and approve
cat .nexus/reviews/NF-1.md
# Review: NF-1
**Verdict:** approved
**Files reviewed:**
- `src/auth/login.py`
- `src/auth/models.py`
- `tests/test_login.py`
**Comments:**
Login endpoint correctly validates credentials against hashed passwords.
Tests cover valid login, invalid password, and unknown user cases.
Session token expiry is tested with a mocked clock.
nexus approve feature NF-F1 --reason "All tests pass, code reviewed"
6. Query what the agents learned
nexus memory query --kind reflection --limit 5
How Code Generation Works
Developer output format
The Developer prompts the LLM with the project context and requires a structured response. Every file is output as:
## FILE: relative/path/to/file.py
```python
# complete file content here
NexusForge parses this format, validates each path against the project root (preventing path traversal), and writes files to disk. If an existing file is being modified, the LLM includes the complete updated file — partial patches are not used.
### Manifest system
After writing files, the Developer saves a manifest to `.nexus/task_files/<task-id>.yaml`:
```yaml
task_id: NF-1
files:
- src/auth/login.py
- src/auth/models.py
- tests/test_login.py
QA and Reviewer read this manifest to know exactly which files to examine. Review documents list the reviewed files explicitly, creating a complete audit trail.
Project context selection
Before each LLM call, Planner and Developer:
- Scan the project tree (skipping
.venv,.git,__pycache__,node_modules, build artifacts) - Read key files:
README.md,pyproject.toml,Cargo.toml,package.json, etc. - Select the most relevant existing source files by matching keywords from the task title and description
- Include up to
max_context_files(default: 8) of actual file content in the prompt
This means the LLM sees real code patterns, real naming conventions, and real project structure before generating anything.
For Existing Projects
Drop NexusForge into any codebase — it reads before it writes.
cd /path/to/existing-project
nexus init
Edit .nexus/config.yaml to match your project:
model_provider: anthropic
model_name: claude-opus-4-7
# Your actual test command
test_command: "uv run pytest -q"
# How many existing source files to include in agent prompts
max_context_files: 8
Then create a task:
nexus task create --title "Add rate limiting to the /api/login endpoint"
nexus start
The Planner will scan your project, find your existing auth code, read your README and project manifest, and produce a plan that references your actual file paths and follows your existing patterns. The Developer will read those same files and generate code that extends your existing implementation.
What NexusForge reads (never modifies during init):
README.md,pyproject.toml,package.json,Cargo.toml,go.mod,Makefile- Source files matching task keywords (up to
max_context_files) - The project file tree (all non-binary, non-venv files)
What NexusForge writes:
- Source and test files generated by the Developer (into your project directory)
- State files in
.nexus/(owned exclusively by NexusForge)
Configuration
All configuration lives in .nexus/config.yaml.
# Phase of development (1–6, controls feature gating)
phase: 1
# LLM provider: anthropic | openai | local | fake
model_provider: anthropic
# Model name passed to the provider API
model_name: claude-opus-4-7
# Maximum tokens the provider may return per call
max_context_tokens: 200000
# HTTP timeout for each LLM request (seconds)
request_timeout_seconds: 120
# Retry policy for rate limits and transient errors
retry_max_attempts: 3
retry_backoff_base: 2.0 # seconds before first retry
retry_backoff_max: 30.0 # cap on any single backoff delay
# Local provider (Ollama-compatible)
local_base_url: http://localhost:11434
# Test command run by QA after each Developer delivery.
# Empty string = LLM-simulated QA (for projects without a real test suite).
# Examples: "uv run pytest -q" | "npm test" | "cargo test" | "make test"
test_command: ""
# Maximum existing source files included in agent prompts.
# Higher = more context, more tokens. Range: 1–50.
max_context_files: 8
Provider options
Anthropic (default)
model_provider: anthropic
model_name: claude-opus-4-7 # or claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
OpenAI
model_provider: openai
model_name: gpt-4o
export OPENAI_API_KEY=sk-...
Local (Ollama — fully offline)
model_provider: local
model_name: llama3.2
local_base_url: http://localhost:11434
ollama pull llama3.2 && ollama serve
OS keyring (optional, keys never written to disk)
python -c "import keyring; keyring.set_password('nexusforge', 'ANTHROPIC_API_KEY', 'sk-ant-...')"
Concurrency and Dependencies
Parallel agent execution
Each agent processes up to 3 messages simultaneously. Independent tasks flow through the full pipeline in parallel:
Tasks NF-1, NF-2, NF-3 (no depends_on):
Planner: [NF-1][NF-2][NF-3]
Developer: [NF-1][NF-2][NF-3] ← three files being written simultaneously
QA: [NF-1][NF-2][NF-3] ← three test runs in parallel
Reviewer: [NF-1][NF-2][NF-3]
Task dependencies
Tasks declare depends_on — IDs that must reach done before dispatch. Set automatically by nexus plan for sequential requirements; also settable in tasks.yaml directly.
- id: NF-1
title: Add database schema migration
depends_on: []
- id: NF-2
title: Implement user model using new schema
depends_on: [NF-1] # waits for NF-1 to finish
LLM error handling
All provider errors are caught per-task — the orchestrator never crashes.
| Error | Behaviour |
|---|---|
| Rate limit (429) | Exponential backoff with jitter, up to retry_max_attempts |
| Timeout | Same retry policy |
| Connection error | Same retry policy |
| Auth failure | Immediate → task blocked, no retries |
| All retries exhausted | Task → blocked with reason |
nexus blockers
# NF-3 — Implement password reset: LLM unavailable after retries: 503 Service Unavailable
nexus why NF-3
# Task NF-3 — Implement password reset
# State: blocked
# ✗ Blocked: LLM unavailable after retries: connection timeout
Fix the provider issue, then edit .nexus/tasks.yaml to reset state: defined and restart.
Full Command Reference
Planning
| Command | Description |
|---|---|
nexus plan [FILE] [--dry-run] |
Decompose requirements file into features + tasks |
Lifecycle
| Command | Description |
|---|---|
nexus init [--force] [--yes] |
Scaffold .nexus/ state directory |
nexus start |
Start orchestrator and all five agents |
nexus stop |
Graceful shutdown via SIGTERM |
nexus status |
Tasks by state and in-progress list |
nexus version |
Print version and exit |
nexus doctor |
Health check: config, keys, provider |
Tasks
| Command | Description |
|---|---|
nexus task list [--state STATE] [--agent AGENT] |
List tasks with optional filters |
nexus task show <id> |
Detail view |
nexus task create --title TEXT [--feature ID] |
Create task in defined state |
Features and Deliveries
| Command | Description |
|---|---|
nexus feature list |
Features with rollup state |
nexus feature show <id> |
Feature detail |
nexus feature create --title TEXT [--description TEXT] |
Create feature |
nexus delivery list |
Deliveries and state |
nexus delivery create --title TEXT |
Create delivery |
Reviews and Approvals
| Command | Description |
|---|---|
nexus reviews [--pending] |
List review files |
nexus approve feature|delivery <id> [--reason TEXT] |
Record approval |
nexus reject feature|delivery <id> --reason TEXT |
Record rejection |
nexus blockers |
Blocked tasks with reasons |
nexus why <task-id> |
Dependency chain and delay explanation |
Observability
| Command | Description |
|---|---|
nexus logs [--agent NAME] [--tail N] [--follow] |
Stream agent logs |
nexus memory query [--tag T] [--substring S] [--kind K] [--limit N] |
Query memory |
State Files
Everything is human-readable and editable. Restart the orchestrator after manual edits.
.nexus/
├── config.yaml Configuration (provider, model, test_command)
├── tasks.yaml Task list — state machine owner: orchestrator
├── features.yaml Feature groups with child task IDs
├── deliveries.yaml Delivery groups with child feature IDs
├── approvals.yaml Approval records (feature and delivery)
├── progress.yaml Append-only state transition log
├── memory.log Append-only agent decision log
├── reflection.log Append-only lessons-learned log
├── task_files/ File manifest per task (written by Developer)
│ └── NF-1.yaml Lists files written for task NF-1
├── logs/ Per-agent structured log files
│ ├── planner.log
│ ├── developer.log
│ ├── qa.log
│ ├── reviewer.log
│ └── memory.log
└── reviews/ Per-task reviewer write-ups
└── NF-1.md
Task lifecycle
defined → in_progress → review → done
↓ ↓
blocked blocked
done requires QA pass (real test suite exit 0, or LLM validation if no test_command). All transitions logged to progress.yaml.
Feature lifecycle
planned → dev_complete → ready_for_test → completed
completed = all tasks done + nexus approve feature.
Delivery lifecycle
planned → in_progress → pending_approval → released
released = all features completed + nexus approve delivery.
Running Offline with a Local Model
ollama pull deepseek-coder # or llama3.2, codellama, mistral
ollama serve
# .nexus/config.yaml:
model_provider: local
model_name: deepseek-coder
local_base_url: http://localhost:11434
nexus start
Smaller models produce lower-quality code and reviews. deepseek-coder is the best open-source choice for code generation tasks. The provider interface is identical — no code changes needed.
Project Layout
NexusForge/
├── src/nexusforge/
│ ├── cli.py Typer CLI entry point (nexus plan, start, approve, ...)
│ ├── orchestrator.py asyncio event loop + state machine + dependency dispatch
│ ├── codeops.py Code I/O: project scan, LLM output parsing, file writing, test runner
│ ├── agents/
│ │ ├── base.py AgentBase: concurrent dispatch, _complete_or_block, path guard
│ │ ├── planner.py Reads project context, plans tasks
│ │ ├── developer.py Writes real code files to disk
│ │ ├── reviewer.py Reads written files, reviews real code
│ │ ├── qa.py Runs test_command or LLM validation
│ │ └── memory.py Logs decisions and reflections, answers queries
│ ├── providers/ LLM backends: OpenAI, Anthropic, Local (Ollama), Fake
│ ├── persistence/ Atomic writes + advisory locking
│ └── models/ Pydantic v2 domain models (Task, Feature, Delivery, ...)
├── tests/
│ ├── unit/ Fast tests using tmp_path + FakeProvider + no real LLM
│ └── integration/ Subprocess-based crash recovery and contention tests
└── .nexus/ Runtime state (created by nexus init)
Development
uv run ruff format . # format
uv run ruff check . # lint
uv run mypy src --strict # type check
uv run pytest -q # tests (85% coverage gate)
uv run bandit -r src/ -q # security scan
uv run pip-audit # dependency audit
CI matrix: Linux / macOS / Windows × Python 3.11 / 3.12. See .github/workflows/ci.yml.
Exit Codes
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Runtime error — config, provider, persistence |
2 |
User error — not found, illegal transition, bad input |
130 |
Interrupted by Ctrl+C |
License
NexusForge is licensed under the Business Source License 1.1 (BUSL-1.1).
Free use: Non-production use — personal projects, evaluation, research, and development — is permitted without restriction.
Commercial production use requires a commercial license. Contact [YOUR_EMAIL] for pricing.
Converts to open source: On 2030-05-12, this license automatically converts to Apache License 2.0, which is a fully open source license with no restrictions.
See LICENSE for the full terms. See mariadb.com/bsl11 for the BSL 1.1 specification.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nexusforge_ai-0.1.0.tar.gz.
File metadata
- Download URL: nexusforge_ai-0.1.0.tar.gz
- Upload date:
- Size: 84.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c77218ca2e546dc4bad689bd1ace42acd7a8d44650505d8c6f6cf4de3f021cdf
|
|
| MD5 |
3b4a4450ba7352b9b0aa3ba553e8787f
|
|
| BLAKE2b-256 |
adacbc8af7a55999f205fbf992a80cdc6261d02a10703b32f0b104c1dd91ce2b
|
File details
Details for the file nexusforge_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nexusforge_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 66.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58c55c3c94641b355769d63146d5cc22eefd0c62d9eb291139c9382590b78e64
|
|
| MD5 |
1e3de265818c843b103eb82f5d4ef180
|
|
| BLAKE2b-256 |
6434506b868a72d430c727df4e5a14dc978eaa815dec96d2a5a99e595242a23c
|