MCP server for local-first AI architectural planning using local LLMs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yafitzdev

These details have not been verified by PyPI

Project description

fitz-forge

Architectural coding planning harness for local LLMs

The Problem • The Insight • Why fitz-forge? • Benchmarks • How It Works • Docs • GitHub

Task: "Add WebSocket support to the chat API"
(Given a real codebase with FastAPI routes, Pydantic schemas, and an existing REST chat endpoint.)

❌ Raw local LLM (no harness)

"Add a WebSocket endpoint.
 Use the websockets library.
 Create a new file for handlers.
 Add authentication middleware."

Generic advice. No file paths. No awareness of existing code. Hallucinated library choice. Would break the existing architecture.

🔨 fitz-forge (same model, same hardware)

Phase 1: Extend ChatRouter in api/routes/chat.py - Add ws_chat() using existing ChatEngine - Reuse AuthMiddleware.verify_token() - Test: pytest tests/api/test_chat_ws.py

Phase 2: Adapt MessageSchema in schemas/chat.py

Add ws_message field (matches existing ChatMessage.content structure)
Verify: pydantic model_validate()

Real files. Real methods. Phased roadmap with verification commands. Grounded in the actual codebase.

→ Same model, same hardware. The difference is the harness: fitz-forge reads your codebase, reasons in stages, self-critiques, and extracts structured output that a small model can actually produce reliably.

Where to start 🚀

[!IMPORTANT] Requires Ollama, LM Studio, or llama.cpp with a loaded model. Also needs fitz-sage for code retrieval.

pip install fitz-forge

fitz plan "Add OAuth2 authentication with Google and GitHub providers"

That's it. Your plan runs overnight on local hardware.

About

I built fitz-forge because the best AI coding tools are dangerously dependent on subsidized API pricing. Claude Code costs $100/month today — heavily subsidized. When those subsidies shrink, the planning phase alone (understanding a codebase, reasoning about architecture, producing a structured plan) could cost more than the subscription. fitz-forge moves that expensive planning phase onto hardware you already own. No API costs. No data leaving your network. And as local models improve, your plans improve for free.

No LangChain. No LlamaIndex. Every layer written from scratch, with code retrieval powered by fitz-sage.

~20k lines of Python. 970+ tests. Built by Yan Fitzner (LinkedIn, GitHub).

fitz-forge llm_with_harness

Why `fitz-forge`?

Cut your Opus bill — plan locally, implement with Sonnet 💸

Agentic planning is the most expensive part of the process, and it's where LLMs struggle the most. fitz-forge produces a markdown artifact you hand to Sonnet for implementation. The expensive tokens never hit your API budget.

Dumb local models produce smart plans 🧠

The pipeline breaks the task into atomic decisions, resolves each against relevant files, then narrates the committed decisions into a plan. Suddenly a local model can produce plans that would overwhelm it in a single prompt.

Runs on whatever hardware you've got 🖥️

Consumer GPU? Models like Qwen3.6-35-a3b or Gemma4-26B-A4b do the whole pipeline. CPU-only box or tiny VRAM? Run a medium model at 10 tok/s overnight. Tokens-per-second stops mattering when you're sleeping.

Drops into Claude Code or Codex via CLI and MCP 🔌

Expose fitz-forge as an MCP server (fitz serve) and it becomes a tool inside Claude Code, or any MCP-capable client. Same principle as with CLI. Tell Claude to create a plan using fitz-forge, and it does the heavy lifting locally while you wait.

Any codebase, any language 🌐

Python, Go, Rust — the retrieval layer indexes by file structure and imports, and the grounding layer validates generated artifacts against whatever the codebase actually contains.

Queue a job. Go to sleep. Relax. Let it run overnight. 🌙

Every stage produces checkpoints. Power outage at minute 15 of a 20-minute run? fitz retry <id> picks up from the last completed stage.

Fully local execution possible 🏠

Ollama, LM Studio, or llama.cpp. No API keys required to start.

Benchmarks

TBD

How It Works

A 10-stage pipeline that decomposes architectural planning into small, focused LLM calls interleaved with deterministic AST work. Retrieval + implementation check feed a decision-based reasoning core (decompose → resolve → synthesize), then artifacts are generated, closure-checked, and grounded against the real codebase before the plan is written.

     USER PROMPT
          │
          ▼
┌─────────────────────────────────────────┐
│ 1. Agent Context Gathering    [6-8 LLM] │  retrieval + compression
├─────────────────────────────────────────┤
│ 2. Implementation Check       [1 LLM]   │  already built?
├─────────────────────────────────────────┤
│ 3. Call Graph Extraction      [0 · AST] │  deterministic
├─────────────────────────────────────────┤
│ 4. Decision Decomposition     [2-4 LLM] │  adaptive best-of-N
├─────────────────────────────────────────┤
│ 5. Decision Resolution        [10-15]   │  1 call per decision
├─────────────────────────────────────────┤
│ 6. Synthesis                  [~15 LLM] │  reasoning + 13 extractions
├─────────────────────────────────────────┤
│ 7. Artifact Generation        [3-8 LLM] │  per-artifact + closure checks
├─────────────────────────────────────────┤
│ 8. Grounding Validation       [0-5 LLM] │  AST + repair
├─────────────────────────────────────────┤
│ 9. Coherence Check            [1 LLM]   │  cross-stage sanity
├─────────────────────────────────────────┤
│ 10. Render + Write            [0]       │  markdown to disk
└─────────────────────────────────────────┘
          │
          ▼
    ~/.fitz-forge/plans/plan_<id>.md

Total: ~40-60 LLM calls · ~7-9 min on RTX 5090

#	Stage	Docs
1	Agent Context Gathering	01_agent-context-gathering.md
2	Implementation Check	02_implementation-check.md
3	Call Graph Extraction	03_call-graph-extraction.md
4	Decision Decomposition	04_decision-decomposition.md
5	Decision Resolution	05_decision-resolution.md
6	Synthesis	06_synthesis.md
7	Artifact Generation	07_artifact-generation.md
8	Grounding Validation	08_grounding-validation.md
9	Coherence Check	09_coherence-check.md
10	Render + Write	—

[!NOTE] The pipeline decomposes a problem that would overwhelm a small model into many small LLM calls it can handle reliably. Each per-field JSON extraction is under 2000 chars — small enough for a 3B quantized model to produce valid output. Deterministic AST work (call graph, grounding check) carries the structural load so LLMs only do what LLMs are good at.

Full pipeline docs: docs/features/ — detailed docs covering every stage and infrastructure component.

📦 Quick Start

# Install
pip install fitz-forge

# Queue a job
fitz plan "Build a plugin system for data transformations"

# Start the background worker
fitz run

# Check on it
fitz status 1

# Read the plan
fitz get 1

Optional extras:

pip install "fitz-forge[api-review]"    # Anthropic API review pass
pip install "fitz-forge[lm-studio]"    # LM Studio provider (openai SDK)
pip install "fitz-forge[dev]"          # pytest, build tools

Prerequisites:

Python 3.10+
Ollama, LM Studio, or llama.cpp with a loaded model
fitz-sage for code retrieval

📦 CLI Reference

fitz plan "description"   # Queue a planning job
fitz run                  # Start background worker (Ctrl+C to stop)
fitz list                 # Show all jobs
fitz status <id>          # Check progress
fitz get <id>             # Print completed plan as markdown
fitz retry <id>           # Re-queue failed/interrupted job
fitz confirm <id>         # Approve optional API review
fitz cancel <id>          # Skip API review, finalize plan
fitz serve                # Start MCP server

Job lifecycle:

QUEUED -> RUNNING -> COMPLETE
                  -> AWAITING_REVIEW -> QUEUED (confirm) / COMPLETE (cancel)
                  -> FAILED / INTERRUPTED (both retryable)

📦 MCP Server

Plug into Claude Code or Claude Desktop:

{
  "mcpServers": {
    "fitz-forge": {
      "command": "fitz",
      "args": ["serve"]
    }
  }
}

MCP Tools:

Tool	Description
`create_plan`	Queue a new planning job
`check_status`	Check job progress
`get_plan`	Retrieve completed plan
`list_plans`	List all planning jobs
`retry_job`	Retry a failed job
`confirm_review`	Approve API review after seeing cost
`cancel_review`	Skip API review, finalize plan

📦 Configuration

Auto-created on first run:

Platform	Path
Windows	`%LOCALAPPDATA%\fitz-forge\fitz-forge\config.yaml`
macOS	`~/Library/Application Support/fitz-forge/config.yaml`
Linux	`~/.config/fitz-forge/config.yaml`

Database (jobs.db) lives in the same directory.

# LLM provider: "ollama", "lm_studio", or "llama_cpp"
provider: lm_studio

lm_studio:
  base_url: http://localhost:1234/v1
  model: qwen3-coder-30b-a3b-instruct    # single model for retrieval + reasoning
  smart_model: null                        # null = use model for all tiers
  fast_model: null                         # null = use model for all tiers
  timeout: 600
  context_length: 65536                    # split reasoning auto-enables below 32768

ollama:
  base_url: http://localhost:11434
  model: qwen2.5-coder-next:80b-instruct
  fallback_model: qwen2.5-coder-next:32b-instruct  # OOM fallback (null to disable)
  timeout: 300
  memory_threshold: 80.0  # RAM % threshold to abort

llama_cpp:
  server_path: /path/to/llama-server
  models_dir: /path/to/models
  port: 8012
  fast_model:
    path: model.gguf
    context_size: 65536
    gpu_layers: -1
    flash_attention: true
    cache_type_k: q8_0
    cache_type_v: q8_0

agent:
  enabled: true
  max_file_bytes: 50000
  max_seed_files: 50    # files available via inspect_files/read_file tools
  source_dir: null      # null = cwd at runtime

confidence:
  default_threshold: 0.7
  security_threshold: 0.9

anthropic:
  api_key: null  # null = API review disabled
  model: claude-sonnet-4-5-20250929

output:
  plans_dir: .fitz-forge/plans
  verbosity: normal

📦 Architecture → Full Architecture Guide

CLI (typer)   --> tools/ --> SQLiteJobStore <-- BackgroundWorker --> PlanningPipeline
MCP (fastmcp) --> tools/ --> SQLiteJobStore

fitz_forge/
├── cli.py                     # Typer CLI (9 commands)
├── server.py                  # FastMCP server + lifecycle
├── __main__.py                # python -m fitz_forge (MCP stdio)
├── tools/                     # Service layer
├── models/                    # JobStore ABC, SQLiteJobStore, JobRecord
├── background/                # BackgroundWorker, signal handling
├── llm/                       # LLM clients (Ollama, LM Studio, llama.cpp), retry
├── planning/
│   ├── pipeline/stages/       # 3 stages (split or combined) + orchestrator + checkpoints
│   ├── agent/                 # Code retrieval bridge to fitz-sage
│   ├── prompts/               # Externalized .txt prompt templates
│   └── confidence/            # Per-section confidence scoring
├── api_review/                # Anthropic review client + cost calculator
├── config/                    # Pydantic schema + YAML loader
└── validation/                # Input sanitization

📦 Development

git clone https://github.com/yafitzdev/fitz-forge.git
cd fitz-forge
pip install -e ".[dev]"  # editable install for development
pytest  # 970+ tests

# Lint
ruff check fitz_forge/
ruff format --check fitz_forge/ tests/

See CONTRIBUTING.md for the full development guide and examples/ for usage examples.

Benchmark factory for A/B testing pipeline changes:

# Retrieval benchmarks (~12s/run)
python -m benchmarks.plan_factory retrieval --runs 10 --source-dir ../your-project

# Reasoning benchmarks with fixed retrieval
python -m benchmarks.plan_factory reasoning --runs 5 --source-dir ../your-project \
  --context-file benchmarks/ideal_context.json --split --max-seeds 5

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yafitzdev

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.2

Apr 17, 2026

0.6.1

Apr 13, 2026

0.6.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitz_forge-0.6.2.tar.gz (243.6 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fitz_forge-0.6.2-py3-none-any.whl (283.2 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file fitz_forge-0.6.2.tar.gz.

File metadata

Download URL: fitz_forge-0.6.2.tar.gz
Upload date: Apr 17, 2026
Size: 243.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fitz_forge-0.6.2.tar.gz
Algorithm	Hash digest
SHA256	`d9bb6b06bf679aab7d21d95ef7bdc600e27cb5c8e907ec65cd77e469756229da`
MD5	`bb458feef9074aeffffc59f24309f471`
BLAKE2b-256	`e0776206a77f6217405fe0d4d90f3e5b3e4b1c0a30d8cec03f0706a917786b6c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fitz_forge-0.6.2.tar.gz:

Publisher: publish.yml on yafitzdev/fitz-forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fitz_forge-0.6.2.tar.gz
- Subject digest: d9bb6b06bf679aab7d21d95ef7bdc600e27cb5c8e907ec65cd77e469756229da
- Sigstore transparency entry: 1323257154
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: yafitzdev/fitz-forge@0ec4b836cbd4ce6dfca1a912bbf4c19533426245
- Branch / Tag: refs/tags/v0.6.2
- Owner: https://github.com/yafitzdev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0ec4b836cbd4ce6dfca1a912bbf4c19533426245
- Trigger Event: release

File details

Details for the file fitz_forge-0.6.2-py3-none-any.whl.

File metadata

Download URL: fitz_forge-0.6.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 283.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fitz_forge-0.6.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb7823d43e0d0bbcc3a939a15908119654f13f896f38d0d064758983b764e0cc`
MD5	`cb1e1c42a8a010e169bbdc17e53ce307`
BLAKE2b-256	`49b1d2a3c2325b14cb491b5fed811780358d8b00a315519cedca1a4f5fff0099`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fitz_forge-0.6.2-py3-none-any.whl:

Publisher: publish.yml on yafitzdev/fitz-forge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fitz_forge-0.6.2-py3-none-any.whl
- Subject digest: eb7823d43e0d0bbcc3a939a15908119654f13f896f38d0d064758983b764e0cc
- Sigstore transparency entry: 1323257361
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: yafitzdev/fitz-forge@0ec4b836cbd4ce6dfca1a912bbf4c19533426245
- Branch / Tag: refs/tags/v0.6.2
- Owner: https://github.com/yafitzdev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0ec4b836cbd4ce6dfca1a912bbf4c19533426245
- Trigger Event: release

fitz-forge 0.6.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

fitz-forge

Architectural coding planning harness for local LLMs

Where to start 🚀

About

Why `fitz-forge`?

Benchmarks

How It Works

License

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance