Skip to main content

AI multi-agent framework — five agents that read your codebase, write real code files, and run your test suite

Project description

NexusForge

PyPI version Python License: BUSL-1.1 CI

An AI multi-agent framework that runs a full software development team on your machine. Five specialized agents — Planner, Developer, Reviewer, QA, and Memory — read your existing codebase, write real code files, run your actual test suite, and coordinate through an asyncio orchestrator. State is stored in plain YAML. No database. No server. No cloud dependency.


What Is NexusForge?

NexusForge turns a backlog of tasks into committed, reviewed, tested code by running five specialized AI agents in a loop. Unlike AI assistants that generate code snippets in a chat window, NexusForge agents read your existing codebase, write files directly to disk, and run your real test suite to validate their work.

Agent What it actually does
Planner Scans the project file tree, reads README and key files, selects relevant existing source files, queries past lessons, then writes a concrete plan naming real files to create or modify
Developer Reads the same project context, generates implementation + tests in a structured format, writes files to disk, saves a manifest of what changed
QA Reads the written files; either runs test_command (your real test suite) or asks the LLM to validate code quality — pass/fail gates the review transition
Reviewer Reads the actual written files from the manifest, reviews real code, writes .nexus/reviews/<task>.md with files reviewed listed
Memory Logs every decision and reflection; surfaces past lessons to Planner before each planning cycle

You create tasks (or let nexus plan do it from a requirements file). Agents do the work. You approve features and deliveries. State survives restarts and crashes because every write is atomic.


How It Is Different

Capability NexusForge AI coding assistants Multi-agent platforms (AutoGen, CrewAI)
Writes real code to disk Yes — files created/modified on disk Chat only No (orchestration only)
Reads existing codebase Yes — scans tree, reads relevant files Per-session only No
Runs real test suite Yes — configurable test_command No No
State persistence YAML files — survive crash and restart Session memory Usually in-memory or DB
Human approval gates Built-in (nexus approve) Ad-hoc Optional
Audit trail Every transition, decision, reflection logged None Varies
Crash recovery Kill-9 tested; atomic writes, no partial state None None
Provider agnostic OpenAI, Anthropic, any Ollama local model Usually one vendor Varies
Fully offline Yes, with Ollama No No
CLI-first Full nexus CLI, scriptable GUI / IDE plugin Python API

The core differentiator: NexusForge treats software development as a stateful, recoverable process with real file I/O — not a chat conversation. Code lands on disk, tests run against real files, and the entire history is auditable in YAML.


Requirements

  • Python 3.11 or 3.12
  • An API key for at least one LLM provider, or a running Ollama instance for fully local operation

Installation

Option A — pip (standard)

# Core package — add the provider you use
pip install nexusforge-ai                     # core only
pip install "nexusforge-ai[anthropic]"        # + Anthropic Claude
pip install "nexusforge-ai[openai]"           # + OpenAI GPT-4o and variants
pip install "nexusforge-ai[local]"            # + Ollama (any local model)
pip install "nexusforge-ai[all]"              # all three providers

Option B — pipx (isolated, nexus available globally)

pipx install "nexusforge-ai[anthropic]"
# nexus is now on your PATH in an isolated environment

Option C — uv (fastest, project-level)

uv add "nexusforge-ai[anthropic]"
# or globally:
uv tool install "nexusforge-ai[anthropic]"

From source (contributors)

git clone https://github.com/your-org/nexusforge
cd NexusForge
uv sync --all-extras
# Run via: uv run nexus <command>

Verify

nexus version
# nexusforge 0.1.0

Getting Started

All examples below assume nexus is on your PATH via pip install nexusforge-ai or pipx install nexusforge-ai. If you installed via uv run, prefix every command with uv run.

1. Initialize inside your project directory

cd my-project          # your existing or new project
nexus init

NexusForge creates .nexus/ alongside your existing code. It does not touch your project files during init.

nexus doctor
           NexusForge Doctor
┌──────────────────────┬────────┬──────────────────────┐
│ Check                │ Status │ Detail               │
├──────────────────────┼────────┼──────────────────────┤
│ .nexus/ directory    │ OK     │                      │
│ config.yaml          │ OK     │ phase=1, provider=.. │
│ API key              │ OK     │ ANTHROPIC_API_KEY set │
│ Provider probe       │ OK     │ claude-opus-4-7      │
└──────────────────────┴────────┴──────────────────────┘

2. Configure your test command

Edit .nexus/config.yaml to point at your real test suite:

# Set to your project's test command. Empty = LLM-simulated QA.
test_command: "uv run pytest -q"   # Python/uv
# test_command: "npm test"         # Node.js
# test_command: "cargo test"       # Rust
# test_command: ""                 # no real tests yet

This is the most important config value. When set, QA runs the real tests after each Developer delivery and uses the exit code to gate the review → done transition.

3. Create your plan from requirements

cat > REQUIREMENTS.md << 'EOF'
# Auth System
Users need to log in and register. Passwords must be hashed.
Sessions expire after 30 minutes of inactivity.

# User Profile
Users can update their display name and avatar.
Profile changes must be audited.
EOF

nexus plan REQUIREMENTS.md

Output:

Decomposing REQUIREMENTS.md with claude-opus-4-7...

Plan: 2 features, 5 tasks

▶ Auth System
  Login, registration and session management
  • Implement JWT login and registration endpoints
    ✓ Returns 200 with token on valid credentials
    ✓ Rejects duplicate emails with 409
  • Add password hashing with bcrypt
    ✓ Passwords never stored in plain text
  • Implement 30-minute session expiry
    ✓ Sessions auto-extend on request; expire after 30 min idle

▶ User Profile
  Profile editing with audit trail
  • Add profile update endpoint with validation
    ✓ Display name and avatar URL validated
  • Implement profile change audit log
    ✓ Every change recorded with timestamp and actor

Created 2 features and 5 tasks. Run nexus start to begin.

Preview without writing: nexus plan REQUIREMENTS.md --dry-run

4. Start the orchestrator

nexus start
# Orchestrator started (PID 12345)

Agents work concurrently — up to 3 tasks per agent simultaneously. Watch what happens in another terminal:

nexus logs --follow
[planner.log]   Planner received task NF-1
[planner.log]   Scanned 47 project files, reading 5 relevant files
[planner.log]   LLM for NF-1: Create src/auth/login.py, src/auth/models.py, tests/test_login.py
[developer.log] Developer received task NF-1
[developer.log] Reading 8 context files from existing codebase
[developer.log] Developer wrote 3 file(s) for NF-1: src/auth/login.py, src/auth/models.py, tests/test_login.py
[qa.log]        QA running 'uv run pytest -q' for NF-1
[qa.log]        QA test run for NF-1: PASS (3 tests, 0 failures)
[reviewer.log]  Reviewer reading 3 files for NF-1
[reviewer.log]  Reviewer LLM for NF-1: approved

5. Review and approve

cat .nexus/reviews/NF-1.md
# Review: NF-1

**Verdict:** approved

**Files reviewed:**
- `src/auth/login.py`
- `src/auth/models.py`
- `tests/test_login.py`

**Comments:**
Login endpoint correctly validates credentials against hashed passwords.
Tests cover valid login, invalid password, and unknown user cases.
Session token expiry is tested with a mocked clock.
nexus approve feature NF-F1 --reason "All tests pass, code reviewed"

6. Query what the agents learned

nexus memory query --kind reflection --limit 5

How Code Generation Works

Developer output format

The Developer prompts the LLM with the project context and requires a structured response. Every file is output as:

## FILE: relative/path/to/file.py
```python
# complete file content here

NexusForge parses this format, validates each path against the project root (preventing path traversal), and writes files to disk. If an existing file is being modified, the LLM includes the complete updated file — partial patches are not used.

### Manifest system

After writing files, the Developer saves a manifest to `.nexus/task_files/<task-id>.yaml`:

```yaml
task_id: NF-1
files:
  - src/auth/login.py
  - src/auth/models.py
  - tests/test_login.py

QA and Reviewer read this manifest to know exactly which files to examine. Review documents list the reviewed files explicitly, creating a complete audit trail.

Project context selection

Before each LLM call, Planner and Developer:

  1. Scan the project tree (skipping .venv, .git, __pycache__, node_modules, build artifacts)
  2. Read key files: README.md, pyproject.toml, Cargo.toml, package.json, etc.
  3. Select the most relevant existing source files by matching keywords from the task title and description
  4. Include up to max_context_files (default: 8) of actual file content in the prompt

This means the LLM sees real code patterns, real naming conventions, and real project structure before generating anything.


For Existing Projects

Drop NexusForge into any codebase — it reads before it writes.

cd /path/to/existing-project
nexus init

Edit .nexus/config.yaml to match your project:

model_provider: anthropic
model_name: claude-opus-4-7

# Your actual test command
test_command: "uv run pytest -q"

# How many existing source files to include in agent prompts
max_context_files: 8

Then create a task:

nexus task create --title "Add rate limiting to the /api/login endpoint"
nexus start

The Planner will scan your project, find your existing auth code, read your README and project manifest, and produce a plan that references your actual file paths and follows your existing patterns. The Developer will read those same files and generate code that extends your existing implementation.

What NexusForge reads (never modifies during init):

  • README.md, pyproject.toml, package.json, Cargo.toml, go.mod, Makefile
  • Source files matching task keywords (up to max_context_files)
  • The project file tree (all non-binary, non-venv files)

What NexusForge writes:

  • Source and test files generated by the Developer (into your project directory)
  • State files in .nexus/ (owned exclusively by NexusForge)

Configuration

All configuration lives in .nexus/config.yaml.

# Phase of development (1–6, controls feature gating)
phase: 1

# LLM provider: anthropic | openai | local | fake
model_provider: anthropic

# Model name passed to the provider API
model_name: claude-opus-4-7

# Maximum tokens the provider may return per call
max_context_tokens: 200000

# HTTP timeout for each LLM request (seconds)
request_timeout_seconds: 120

# Retry policy for rate limits and transient errors
retry_max_attempts: 3
retry_backoff_base: 2.0      # seconds before first retry
retry_backoff_max: 30.0      # cap on any single backoff delay

# Local provider (Ollama-compatible)
local_base_url: http://localhost:11434

# Test command run by QA after each Developer delivery.
# Empty string = LLM-simulated QA (for projects without a real test suite).
# Examples: "uv run pytest -q" | "npm test" | "cargo test" | "make test"
test_command: ""

# Maximum existing source files included in agent prompts.
# Higher = more context, more tokens. Range: 1–50.
max_context_files: 8

Provider options

Anthropic (default)

model_provider: anthropic
model_name: claude-opus-4-7   # or claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...

OpenAI

model_provider: openai
model_name: gpt-4o
export OPENAI_API_KEY=sk-...

Local (Ollama — fully offline)

model_provider: local
model_name: llama3.2
local_base_url: http://localhost:11434
ollama pull llama3.2 && ollama serve

OS keyring (optional, keys never written to disk)

python -c "import keyring; keyring.set_password('nexusforge', 'ANTHROPIC_API_KEY', 'sk-ant-...')"

Concurrency and Dependencies

Parallel agent execution

Each agent processes up to 3 messages simultaneously. Independent tasks flow through the full pipeline in parallel:

Tasks NF-1, NF-2, NF-3 (no depends_on):

  Planner:   [NF-1][NF-2][NF-3]
  Developer: [NF-1][NF-2][NF-3]   ← three files being written simultaneously
  QA:        [NF-1][NF-2][NF-3]   ← three test runs in parallel
  Reviewer:  [NF-1][NF-2][NF-3]

Task dependencies

Tasks declare depends_on — IDs that must reach done before dispatch. Set automatically by nexus plan for sequential requirements; also settable in tasks.yaml directly.

- id: NF-1
  title: Add database schema migration
  depends_on: []

- id: NF-2
  title: Implement user model using new schema
  depends_on: [NF-1]   # waits for NF-1 to finish

LLM error handling

All provider errors are caught per-task — the orchestrator never crashes.

Error Behaviour
Rate limit (429) Exponential backoff with jitter, up to retry_max_attempts
Timeout Same retry policy
Connection error Same retry policy
Auth failure Immediate → task blocked, no retries
All retries exhausted Task → blocked with reason
nexus blockers
# NF-3 — Implement password reset: LLM unavailable after retries: 503 Service Unavailable

nexus why NF-3
# Task NF-3 — Implement password reset
#   State:    blocked
#   ✗ Blocked: LLM unavailable after retries: connection timeout

Fix the provider issue, then edit .nexus/tasks.yaml to reset state: defined and restart.


Full Command Reference

Planning

Command Description
nexus plan [FILE] [--dry-run] Decompose requirements file into features + tasks

Lifecycle

Command Description
nexus init [--force] [--yes] Scaffold .nexus/ state directory
nexus start Start orchestrator and all five agents
nexus stop Graceful shutdown via SIGTERM
nexus status Tasks by state and in-progress list
nexus version Print version and exit
nexus doctor Health check: config, keys, provider

Tasks

Command Description
nexus task list [--state STATE] [--agent AGENT] List tasks with optional filters
nexus task show <id> Detail view
nexus task create --title TEXT [--feature ID] Create task in defined state

Features and Deliveries

Command Description
nexus feature list Features with rollup state
nexus feature show <id> Feature detail
nexus feature create --title TEXT [--description TEXT] Create feature
nexus delivery list Deliveries and state
nexus delivery create --title TEXT Create delivery

Reviews and Approvals

Command Description
nexus reviews [--pending] List review files
nexus approve feature|delivery <id> [--reason TEXT] Record approval
nexus reject feature|delivery <id> --reason TEXT Record rejection
nexus blockers Blocked tasks with reasons
nexus why <task-id> Dependency chain and delay explanation

Observability

Command Description
nexus logs [--agent NAME] [--tail N] [--follow] Stream agent logs
nexus memory query [--tag T] [--substring S] [--kind K] [--limit N] Query memory

State Files

Everything is human-readable and editable. Restart the orchestrator after manual edits.

.nexus/
├── config.yaml          Configuration (provider, model, test_command)
├── tasks.yaml           Task list — state machine owner: orchestrator
├── features.yaml        Feature groups with child task IDs
├── deliveries.yaml      Delivery groups with child feature IDs
├── approvals.yaml       Approval records (feature and delivery)
├── progress.yaml        Append-only state transition log
├── memory.log           Append-only agent decision log
├── reflection.log       Append-only lessons-learned log
├── task_files/          File manifest per task (written by Developer)
│   └── NF-1.yaml        Lists files written for task NF-1
├── logs/                Per-agent structured log files
│   ├── planner.log
│   ├── developer.log
│   ├── qa.log
│   ├── reviewer.log
│   └── memory.log
└── reviews/             Per-task reviewer write-ups
    └── NF-1.md

Task lifecycle

defined → in_progress → review → done
    ↓           ↓
 blocked     blocked

done requires QA pass (real test suite exit 0, or LLM validation if no test_command). All transitions logged to progress.yaml.

Feature lifecycle

planned → dev_complete → ready_for_test → completed

completed = all tasks done + nexus approve feature.

Delivery lifecycle

planned → in_progress → pending_approval → released

released = all features completed + nexus approve delivery.


Running Offline with a Local Model

ollama pull deepseek-coder   # or llama3.2, codellama, mistral
ollama serve

# .nexus/config.yaml:
model_provider: local
model_name: deepseek-coder
local_base_url: http://localhost:11434

nexus start

Smaller models produce lower-quality code and reviews. deepseek-coder is the best open-source choice for code generation tasks. The provider interface is identical — no code changes needed.


Project Layout

NexusForge/
├── src/nexusforge/
│   ├── cli.py              Typer CLI entry point (nexus plan, start, approve, ...)
│   ├── orchestrator.py     asyncio event loop + state machine + dependency dispatch
│   ├── codeops.py          Code I/O: project scan, LLM output parsing, file writing, test runner
│   ├── agents/
│   │   ├── base.py         AgentBase: concurrent dispatch, _complete_or_block, path guard
│   │   ├── planner.py      Reads project context, plans tasks
│   │   ├── developer.py    Writes real code files to disk
│   │   ├── reviewer.py     Reads written files, reviews real code
│   │   ├── qa.py           Runs test_command or LLM validation
│   │   └── memory.py       Logs decisions and reflections, answers queries
│   ├── providers/          LLM backends: OpenAI, Anthropic, Local (Ollama), Fake
│   ├── persistence/        Atomic writes + advisory locking
│   └── models/             Pydantic v2 domain models (Task, Feature, Delivery, ...)
├── tests/
│   ├── unit/               Fast tests using tmp_path + FakeProvider + no real LLM
│   └── integration/        Subprocess-based crash recovery and contention tests
└── .nexus/                 Runtime state (created by nexus init)

Development

uv run ruff format .         # format
uv run ruff check .          # lint
uv run mypy src --strict     # type check
uv run pytest -q             # tests (85% coverage gate)
uv run bandit -r src/ -q     # security scan
uv run pip-audit             # dependency audit

CI matrix: Linux / macOS / Windows × Python 3.11 / 3.12. See .github/workflows/ci.yml.


Exit Codes

Code Meaning
0 Success
1 Runtime error — config, provider, persistence
2 User error — not found, illegal transition, bad input
130 Interrupted by Ctrl+C

License

NexusForge is licensed under the Business Source License 1.1 (BUSL-1.1).

Free use: Non-production use — personal projects, evaluation, research, and development — is permitted without restriction.

Commercial production use requires a commercial license. Contact [YOUR_EMAIL] for pricing.

Converts to open source: On 2030-05-12, this license automatically converts to Apache License 2.0, which is a fully open source license with no restrictions.

See LICENSE for the full terms. See mariadb.com/bsl11 for the BSL 1.1 specification.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexusforge_ai-0.1.0.tar.gz (84.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nexusforge_ai-0.1.0-py3-none-any.whl (66.6 kB view details)

Uploaded Python 3

File details

Details for the file nexusforge_ai-0.1.0.tar.gz.

File metadata

  • Download URL: nexusforge_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 84.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nexusforge_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c77218ca2e546dc4bad689bd1ace42acd7a8d44650505d8c6f6cf4de3f021cdf
MD5 3b4a4450ba7352b9b0aa3ba553e8787f
BLAKE2b-256 adacbc8af7a55999f205fbf992a80cdc6261d02a10703b32f0b104c1dd91ce2b

See more details on using hashes here.

File details

Details for the file nexusforge_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nexusforge_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nexusforge_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 58c55c3c94641b355769d63146d5cc22eefd0c62d9eb291139c9382590b78e64
MD5 1e3de265818c843b103eb82f5d4ef180
BLAKE2b-256 6434506b868a72d430c727df4e5a14dc978eaa815dec96d2a5a99e595242a23c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page