MCP server for local-first AI architectural planning using local LLMs
Project description
fitz-graveyard
Overnight AI architectural planning on local hardware. Queue a job. Go to sleep. Wake up to a plan.
The Problem • The Insight • Why fitz-graveyard? • How It Works • GitHub
pip install fitz-graveyard
fitz-graveyard plan "Add OAuth2 authentication with Google and GitHub providers"
fitz-graveyard run # let it cook overnight
fitz-graveyard get 1 # full architectural plan in the morning
About 🧑🌾
Solo project by Yan Fitzner (LinkedIn, GitHub).
- ~7k lines of Python
- 400+ tests (402)
- Zero LangChain/LlamaIndex dependencies — built from scratch
The Problem
Claude Code costs $100/month to run semi-productively — and that's heavily subsidized. When subsidies shrink, prices go up. The single most expensive operation in agentic LLM coding is the planning phase: understanding a codebase, reasoning about architecture, producing a structured plan. Every token of that burns through your API budget.
What if the planning phase could run on local hardware instead? What if you could do it with a machine you already own?
The Insight 💡
Running LLMs locally means balancing three things: tokens per second, quantization quality, and model intelligence. A 70B model at high quant gives you excellent reasoning but crawls at 2-5 tok/s on consumer hardware. That feels unusable — until you realize planning doesn't need to be interactive.
Queue a job. Go to sleep. Let it run overnight.
Suddenly tok/s doesn't matter. You can run a large, intelligent model purely in RAM at 10 tok/s and that's fine.
10 tok/s × 60s × 60min × 8 hours = 288,000 tokens
That's enough for a full architectural plan — reasoning, self-critique, structured extraction — from a model running on hardware you already own. No API costs. No data leaving your network.
And the best part: as local models improve, your plans improve for free.
Why fitz-graveyard?
Runs on modest hardware 🖥️
A 35B model at Q6 on a single GPU produces plans in ~15 minutes. A 70B model in RAM takes a few hours. You don't need a datacenter — you need patience and a machine that can stay on overnight.
Reads your codebase first 🔍
An agent builds a structural index of your codebase (classes, functions, imports), navigates it using keyword extraction to pick task-relevant files, summarizes them, and synthesizes a context document. An implementation check then verifies whether the task is already built before planning begins. Every planning stage sees your actual code, not a hallucinated version of it.
Per-field extraction that small models can handle 🧩
Each stage does 1 reasoning pass + 1 self-critique + N tiny JSON extractions (<2000 chars each). Even a 3B model can reliably produce structured output at this scale. Failed extractions get Pydantic defaults instead of crashing the stage — partial plan > no plan.
Crash recovery built in 🔄
Jobs checkpoint to SQLite. Machine crashes mid-plan?
retrypicks up from the last checkpoint. Power goes out overnight? Resume in the morning.
Claude where it counts, local everywhere else 🎯
The local model does the heavy lifting — 95% of the tokens. But the pipeline knows what it's uncertain about. Per-section confidence scoring flags weak spots, and those sections can pause for an Anthropic API review pass before the plan finalizes. You get Claude-grade quality on the parts that matter, at a fraction of the token cost. Fully optional — off by default, zero API calls unless you opt in.
Two interfaces, same engine 🔌
CLI for background job queues, MCP server for Claude Code / Claude Desktop integration. Both wrap the same
tools/service layer and SQLite job store.
Other features at a glance 🃏
- [x] Two LLM providers. Ollama (with OOM fallback to smaller model) or LM Studio (OpenAI-compatible API).
- [x] Cross-stage coherence check. Post-pipeline pass verifies context → architecture → roadmap consistency.
- [x] Section-specific confidence scoring. Each section type (context, architecture, design, roadmap, risk) scored against its own criteria with 1-10 granularity.
- [x] Implementation detection. Surgical check prevents planning to build what already exists.
How It Works
An agent pre-stage followed by 3 merged planning stages. Each stage uses per-field extraction: one reasoning prompt produces analysis, a self-critique pass catches scope inflation and hallucinated files, then small JSON extractions pull structured data from the reasoning.
[Agent] map file tree → build structural index → navigate by keywords → summarize → synthesize
|
v
[Check] implementation check — is this task already built?
|
v
[Stage 1] Context — requirements, constraints, assumptions (4 field groups)
[Stage 2] Architecture + Design — merged stage (6 field groups)
[Stage 3] Roadmap + Risk — merged stage (3 field groups)
|
v
[Post] coherence check → confidence scoring (section-specific criteria) → optional API review → render markdown
[!NOTE] The pipeline decomposes a problem that would overwhelm a small model into pieces it can handle reliably. Each JSON extraction is <2000 chars — small enough for a 3B quantized model to produce valid output.
📦 Quick Start
# Install
pip install fitz-graveyard
# Queue a job
fitz-graveyard plan "Build a plugin system for data transformations"
# Start the background worker
fitz-graveyard run
# Check on it
fitz-graveyard status 1
# Read the plan
fitz-graveyard get 1
Optional extras:
pip install "fitz-graveyard[api-review]" # Anthropic API review pass
pip install "fitz-graveyard[lm-studio]" # LM Studio provider (openai SDK)
pip install "fitz-graveyard[dev]" # pytest, build tools
Prerequisites:
📦 CLI Reference
fitz-graveyard plan "description" # Queue a planning job
fitz-graveyard run # Start background worker (Ctrl+C to stop)
fitz-graveyard list # Show all jobs
fitz-graveyard status <id> # Check progress
fitz-graveyard get <id> # Print completed plan as markdown
fitz-graveyard retry <id> # Re-queue failed/interrupted job
fitz-graveyard confirm <id> # Approve optional API review
fitz-graveyard cancel <id> # Skip API review, finalize plan
fitz-graveyard serve # Start MCP server
Job lifecycle:
QUEUED → RUNNING → COMPLETE
→ AWAITING_REVIEW → QUEUED (confirm) / COMPLETE (cancel)
→ FAILED / INTERRUPTED (both retryable)
📦 MCP Server
Plug into Claude Code or Claude Desktop:
{
"mcpServers": {
"fitz-graveyard": {
"command": "fitz-graveyard",
"args": ["serve"]
}
}
}
MCP Tools:
| Tool | Description |
|---|---|
create_plan |
Queue a new planning job |
check_status |
Check job progress |
get_plan |
Retrieve completed plan |
list_plans |
List all planning jobs |
retry_job |
Retry a failed job |
confirm_review |
Approve API review after seeing cost |
cancel_review |
Skip API review, finalize plan |
📦 Configuration
Auto-created on first run:
| Platform | Path |
|---|---|
| Windows | %LOCALAPPDATA%\fitz-graveyard\fitz-graveyard\config.yaml |
| macOS | ~/Library/Application Support/fitz-graveyard/config.yaml |
| Linux | ~/.config/fitz-graveyard/config.yaml |
Database (jobs.db) lives in the same directory.
# LLM provider: "ollama" or "lm_studio"
provider: ollama
ollama:
base_url: http://localhost:11434
model: qwen2.5-coder-next:80b-instruct
fallback_model: qwen2.5-coder-next:32b-instruct # OOM fallback (null to disable)
timeout: 300
memory_threshold: 80.0 # RAM % threshold to abort
lm_studio:
base_url: http://localhost:1234/v1
model: local-model
timeout: 300
agent:
enabled: true
max_summary_files: 15
source_dir: null # null = cwd at runtime
confidence:
default_threshold: 0.7
security_threshold: 0.9
anthropic:
api_key: null # null = API review disabled
model: claude-sonnet-4-5-20250929
output:
plans_dir: .fitz-graveyard/plans
verbosity: normal
📦 Architecture
CLI (typer) ──→ tools/ ──→ SQLiteJobStore ←── BackgroundWorker ──→ PlanningPipeline
MCP (fastmcp) ──→ tools/ ──→ SQLiteJobStore
fitz_graveyard/
├── cli.py # Typer CLI (9 commands)
├── server.py # FastMCP server + lifecycle
├── __main__.py # python -m fitz_graveyard (MCP stdio)
├── tools/ # Service layer
├── models/ # JobStore ABC, SQLiteJobStore, JobRecord
├── background/ # BackgroundWorker, signal handling
├── llm/ # LLM clients (Ollama, LM Studio), retry, memory monitor
├── planning/
│ ├── pipeline/stages/ # 3 merged stages + orchestrator + checkpoints
│ ├── agent/ # Multi-pass codebase context gatherer
│ ├── prompts/ # Externalized .txt prompt templates
│ └── confidence/ # Per-section confidence scoring
├── api_review/ # Anthropic review client + cost calculator
├── config/ # Pydantic schema + YAML loader
└── validation/ # Input sanitization
📦 Development
git clone https://github.com/yafitzdev/fitz-graveyard.git
cd fitz-graveyard
pip install -e ".[dev]" # editable install for development
pytest # 400 tests
License
MIT
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fitz_graveyard-0.3.0.tar.gz.
File metadata
- Download URL: fitz_graveyard-0.3.0.tar.gz
- Upload date:
- Size: 132.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
893f9fd453409acc927c75cc32bac0e44e926ac7f3564a5a2a4c95e080757111
|
|
| MD5 |
e37f293b77d8de2d9682a83318585322
|
|
| BLAKE2b-256 |
62238773412f949666a3dd893b16814da79642ce0271b76a932593e83552564f
|
File details
Details for the file fitz_graveyard-0.3.0-py3-none-any.whl.
File metadata
- Download URL: fitz_graveyard-0.3.0-py3-none-any.whl
- Upload date:
- Size: 161.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0e92e9c20b3f26ce850e149c304c3fb14c188064d2fb2df41339e26b5802366
|
|
| MD5 |
f87b238e4d6c2fb941068520b4d823d7
|
|
| BLAKE2b-256 |
4d12e0e9192c06e3aa54591306e156d6d4b952ebde76cb73748f8f0df127b2b4
|