MCP server for local-first AI architectural planning using local LLMs

These details have not been verified by PyPI

Project links

Project description

fitz-graveyard

Overnight AI architectural planning on local hardware. Queue a job. Go to sleep. Wake up to a plan.

The Problem • The Insight • Why fitz-graveyard? • How It Works • GitHub

pip install fitz-graveyard

fitz-graveyard plan "Add OAuth2 authentication with Google and GitHub providers"
fitz-graveyard run   # let it cook overnight
fitz-graveyard get 1 # full architectural plan in the morning

About 🧑‍🌾

Solo project by Yan Fitzner (LinkedIn, GitHub).

~7k lines of Python
400+ tests (402)
Zero LangChain/LlamaIndex dependencies — built from scratch

The Problem

Claude Code costs $100/month to run semi-productively — and that's heavily subsidized. When subsidies shrink, prices go up. The single most expensive operation in agentic LLM coding is the planning phase: understanding a codebase, reasoning about architecture, producing a structured plan. Every token of that burns through your API budget.

What if the planning phase could run on local hardware instead? What if you could do it with a machine you already own?

The Insight 💡

Running LLMs locally means balancing three things: tokens per second, quantization quality, and model intelligence. A 70B model at high quant gives you excellent reasoning but crawls at 2-5 tok/s on consumer hardware. That feels unusable — until you realize planning doesn't need to be interactive.

Queue a job. Go to sleep. Let it run overnight.

Suddenly tok/s doesn't matter. You can run a large, intelligent model purely in RAM at 10 tok/s and that's fine.

10 tok/s × 60s × 60min × 8 hours = 288,000 tokens

That's enough for a full architectural plan — reasoning, self-critique, structured extraction — from a model running on hardware you already own. No API costs. No data leaving your network.

And the best part: as local models improve, your plans improve for free.

Why fitz-graveyard?

Runs on modest hardware 🖥️

A 35B model at Q6 on a single GPU produces plans in ~15 minutes. A 70B model in RAM takes a few hours. You don't need a datacenter — you need patience and a machine that can stay on overnight.

Reads your codebase first 🔍

An agent builds a structural index of your codebase (classes, functions, imports), navigates it using keyword extraction to pick task-relevant files, summarizes them, and synthesizes a context document. An implementation check then verifies whether the task is already built before planning begins. Every planning stage sees your actual code, not a hallucinated version of it.

Per-field extraction that small models can handle 🧩

Each stage does 1 reasoning pass + 1 self-critique + N tiny JSON extractions (<2000 chars each). Even a 3B model can reliably produce structured output at this scale. Failed extractions get Pydantic defaults instead of crashing the stage — partial plan > no plan.

Crash recovery built in 🔄

Jobs checkpoint to SQLite. Machine crashes mid-plan? retry picks up from the last checkpoint. Power goes out overnight? Resume in the morning.

Claude where it counts, local everywhere else 🎯

The local model does the heavy lifting — 95% of the tokens. But the pipeline knows what it's uncertain about. Per-section confidence scoring flags weak spots, and those sections can pause for an Anthropic API review pass before the plan finalizes. You get Claude-grade quality on the parts that matter, at a fraction of the token cost. Fully optional — off by default, zero API calls unless you opt in.

Two interfaces, same engine 🔌

CLI for background job queues, MCP server for Claude Code / Claude Desktop integration. Both wrap the same tools/ service layer and SQLite job store.

Other features at a glance 🃏

[x] Two LLM providers. Ollama (with OOM fallback to smaller model) or LM Studio (OpenAI-compatible API).

[x] Cross-stage coherence check. Post-pipeline pass verifies context → architecture → roadmap consistency.

[x] Section-specific confidence scoring. Each section type (context, architecture, design, roadmap, risk) scored against its own criteria with 1-10 granularity.

[x] Implementation detection. Surgical check prevents planning to build what already exists.

How It Works

An agent pre-stage followed by 3 merged planning stages. Each stage uses per-field extraction: one reasoning prompt produces analysis, a self-critique pass catches scope inflation and hallucinated files, then small JSON extractions pull structured data from the reasoning.

  [Agent]    map file tree → build structural index → navigate by keywords → summarize → synthesize
                 |
                 v
  [Check]    implementation check — is this task already built?
                 |
                 v
  [Stage 1]  Context — requirements, constraints, assumptions (4 field groups)
  [Stage 2]  Architecture + Design — merged stage (6 field groups)
  [Stage 3]  Roadmap + Risk — merged stage (3 field groups)
                 |
                 v
  [Post]     coherence check → confidence scoring (section-specific criteria) → optional API review → render markdown

[!NOTE] The pipeline decomposes a problem that would overwhelm a small model into pieces it can handle reliably. Each JSON extraction is <2000 chars — small enough for a 3B quantized model to produce valid output.

📦 Quick Start

# Install
pip install fitz-graveyard

# Queue a job
fitz-graveyard plan "Build a plugin system for data transformations"

# Start the background worker
fitz-graveyard run

# Check on it
fitz-graveyard status 1

# Read the plan
fitz-graveyard get 1

Optional extras:

pip install "fitz-graveyard[api-review]"    # Anthropic API review pass
pip install "fitz-graveyard[lm-studio]"    # LM Studio provider (openai SDK)
pip install "fitz-graveyard[dev]"          # pytest, build tools

Prerequisites:

Python 3.10+
Ollama installed and running, or LM Studio with a loaded model

📦 CLI Reference

fitz-graveyard plan "description"   # Queue a planning job
fitz-graveyard run                  # Start background worker (Ctrl+C to stop)
fitz-graveyard list                 # Show all jobs
fitz-graveyard status <id>          # Check progress
fitz-graveyard get <id>             # Print completed plan as markdown
fitz-graveyard retry <id>           # Re-queue failed/interrupted job
fitz-graveyard confirm <id>         # Approve optional API review
fitz-graveyard cancel <id>          # Skip API review, finalize plan
fitz-graveyard serve                # Start MCP server

Job lifecycle:

QUEUED → RUNNING → COMPLETE
                 → AWAITING_REVIEW → QUEUED (confirm) / COMPLETE (cancel)
                 → FAILED / INTERRUPTED (both retryable)

📦 MCP Server

Plug into Claude Code or Claude Desktop:

{
  "mcpServers": {
    "fitz-graveyard": {
      "command": "fitz-graveyard",
      "args": ["serve"]
    }
  }
}

MCP Tools:

Tool	Description
`create_plan`	Queue a new planning job
`check_status`	Check job progress
`get_plan`	Retrieve completed plan
`list_plans`	List all planning jobs
`retry_job`	Retry a failed job
`confirm_review`	Approve API review after seeing cost
`cancel_review`	Skip API review, finalize plan

📦 Configuration

Auto-created on first run:

Platform	Path
Windows	`%LOCALAPPDATA%\fitz-graveyard\fitz-graveyard\config.yaml`
macOS	`~/Library/Application Support/fitz-graveyard/config.yaml`
Linux	`~/.config/fitz-graveyard/config.yaml`

Database (jobs.db) lives in the same directory.

# LLM provider: "ollama" or "lm_studio"
provider: ollama

ollama:
  base_url: http://localhost:11434
  model: qwen2.5-coder-next:80b-instruct
  fallback_model: qwen2.5-coder-next:32b-instruct  # OOM fallback (null to disable)
  timeout: 300
  memory_threshold: 80.0  # RAM % threshold to abort

lm_studio:
  base_url: http://localhost:1234/v1
  model: local-model
  timeout: 300

agent:
  enabled: true
  max_summary_files: 15
  source_dir: null  # null = cwd at runtime

confidence:
  default_threshold: 0.7
  security_threshold: 0.9

anthropic:
  api_key: null  # null = API review disabled
  model: claude-sonnet-4-5-20250929

output:
  plans_dir: .fitz-graveyard/plans
  verbosity: normal

📦 Architecture

CLI (typer)   ──→ tools/ ──→ SQLiteJobStore ←── BackgroundWorker ──→ PlanningPipeline
MCP (fastmcp) ──→ tools/ ──→ SQLiteJobStore

fitz_graveyard/
├── cli.py                     # Typer CLI (9 commands)
├── server.py                  # FastMCP server + lifecycle
├── __main__.py                # python -m fitz_graveyard (MCP stdio)
├── tools/                     # Service layer
├── models/                    # JobStore ABC, SQLiteJobStore, JobRecord
├── background/                # BackgroundWorker, signal handling
├── llm/                       # LLM clients (Ollama, LM Studio), retry, memory monitor
├── planning/
│   ├── pipeline/stages/       # 3 merged stages + orchestrator + checkpoints
│   ├── agent/                 # Multi-pass codebase context gatherer
│   ├── prompts/               # Externalized .txt prompt templates
│   └── confidence/            # Per-section confidence scoring
├── api_review/                # Anthropic review client + cost calculator
├── config/                    # Pydantic schema + YAML loader
└── validation/                # Input sanitization

📦 Development

git clone https://github.com/yafitzdev/fitz-graveyard.git
cd fitz-graveyard
pip install -e ".[dev]"  # editable install for development
pytest  # 400 tests

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Mar 15, 2026

This version

0.3.0

Mar 11, 2026

0.1.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitz_graveyard-0.3.0.tar.gz (132.3 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fitz_graveyard-0.3.0-py3-none-any.whl (161.7 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file fitz_graveyard-0.3.0.tar.gz.

File metadata

Download URL: fitz_graveyard-0.3.0.tar.gz
Upload date: Mar 11, 2026
Size: 132.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for fitz_graveyard-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`893f9fd453409acc927c75cc32bac0e44e926ac7f3564a5a2a4c95e080757111`
MD5	`e37f293b77d8de2d9682a83318585322`
BLAKE2b-256	`62238773412f949666a3dd893b16814da79642ce0271b76a932593e83552564f`

See more details on using hashes here.

File details

Details for the file fitz_graveyard-0.3.0-py3-none-any.whl.

File metadata

Download URL: fitz_graveyard-0.3.0-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 161.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for fitz_graveyard-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0e92e9c20b3f26ce850e149c304c3fb14c188064d2fb2df41339e26b5802366`
MD5	`f87b238e4d6c2fb941068520b4d823d7`
BLAKE2b-256	`4d12e0e9192c06e3aa54591306e156d6d4b952ebde76cb73748f8f0df127b2b4`

See more details on using hashes here.

fitz-graveyard 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fitz-graveyard

Overnight AI architectural planning on local hardware. Queue a job. Go to sleep. Wake up to a plan.

About 🧑‍🌾

The Problem

The Insight 💡

Why fitz-graveyard?

How It Works

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes