Audit and trim OpenClaw bootstrap files that brush the ~12k char ceiling.
Project description
bootstrap-doctor
bootstrap file doctor for OpenClaw. Audits the bootstrap markdown files that get loaded into every session prefix, flags sections that should move out, and rewrites the originals with one-line breadcrumbs to the relocated content.
Why
OpenClaw bootstrap files (AGENTS.md, TOOLS.md, SOUL.md, USER.md, SAFETY_RULES.md, IDENTITY.md, HEARTBEAT.md, MEMORY.md) load into the session prefix on every turn. There is an empirical soft ceiling around 12,000 chars per file before content gets silently truncated mid-session, dropping bootstrap context with no error. Several files already brush that ceiling.
Triple that across workspace-claude, workspace-main, and workspace-researcher and the bloat surface multiplies. Today this is managed by hand: notice a file is too big, eyeball a section, copy it to memory/cards/, leave a breadcrumb. That gets skipped.
bootstrap-doctor automates the audit-and-relocate loop. Originals stay short, offloaded sections live in memory/cards/, breadcrumbs in the original point at the card.
What it does
Three subcommands, dry-run by default:
bootstrap-doctor status # read-only summary of every tracked file
bootstrap-doctor audit # heuristic shortlist plus LLM verdicts (keep / move / unsure)
bootstrap-doctor trim [--apply] # apply the audit plan: write cards, replace sections with breadcrumbs
status and audit are read-only. They can run even if cards_dir does not exist yet. trim defaults to dry-run; pass --apply to actually write.
Install
From PyPI:
pipx install bootstrap-doctor
From a local clone:
git clone https://github.com/escoffier-labs/bootstrap-doctor
cd bootstrap-doctor
pipx install -e .
Requires Python 3.11+. Runtime dep: requests (used by the gateway client).
The bootstrap size limits (soft / hard / ceiling) are owned by brigade via brigade.budgets. bootstrap-doctor ships a mirrored fallback so it runs standalone without brigade installed. To source the limits from brigade directly and stay in lockstep with the rest of the tooling, install the optional extra:
pipx install "bootstrap-doctor[brigade]"
Usage
status
$ bootstrap-doctor status
workspace: /home/you/.openclaw/workspace
AGENTS.md 11,805 chars 185 lines OVER soft (10,000)
TOOLS.md 11,589 chars 221 lines OVER soft (10,000)
SOUL.md 8,373 chars 124 lines ok
SAFETY_RULES.md 7,658 chars 118 lines ok
USER.md 7,229 chars 96 lines ok
IDENTITY.md 3,402 chars 52 lines ok
HEARTBEAT.md 2,109 chars 34 lines ok
MEMORY.md 15,720 chars 192 lines OVER hard (11,500)
Use --json for machine-readable output with soft_limit, hard_limit, and one row per tracked file.
audit
$ bootstrap-doctor audit
TOOLS.md
## Postiz API endpoints [move] -> postiz-api-endpoints.md
## Eero device culling [keep]
## Jellyfin tool patterns [move] -> jellyfin-tool-patterns.md
AGENTS.md
## codex-builder agent gotchas [unsure]
## ACPX-routed agent pinning [move] -> acpx-routed-agent-pinning.md
No writes. Verdicts cached by content hash so re-runs are cheap; pass --no-cache to force re-judgement.
trim
$ bootstrap-doctor trim
DRY-RUN. Would write:
memory/cards/postiz-api-endpoints.md (+412 lines)
memory/cards/jellyfin-tool-patterns.md (+58 lines)
Would modify:
TOOLS.md -73 lines (11,589 -> 8,402 chars)
Pass --apply to commit.
$ bootstrap-doctor trim --apply
unsure verdicts are never auto-applied. They show up in audit output for manual review.
Config
~/.config/bootstrap-doctor/config.toml:
workspace_dir = "~/.openclaw/workspace"
cards_dir = "~/.openclaw/workspace/memory/cards"
gateway_url = "http://localhost:11434"
gateway_model = "deepseek-v4-pro:cloud"
soft_limit = 10000
hard_limit = 11500
tracked_files = [
"AGENTS.md", "TOOLS.md", "SOUL.md", "USER.md",
"SAFETY_RULES.md", "IDENTITY.md", "HEARTBEAT.md", "MEMORY.md",
]
named_workspaces = ["workspace-claude", "workspace-main", "workspace-researcher"]
[heuristics]
min_section_chars = 400
stale_days = 60
Layering: built-in defaults, then config file, then env vars, then CLI flags.
Path-like values must be non-empty strings without control characters or leading/trailing whitespace. tracked_files and named_workspaces must be local names, not paths.
How it works
flowchart TB
subgraph INPUTS [" inputs "]
FILES["<b>Bootstrap files</b><br/>AGENTS.md · TOOLS.md · MEMORY.md · ..."]
CONFIG["<b>Config layering</b><br/>defaults · config.toml · env · flags"]
BUDGETS["<b>brigade.budgets</b><br/>soft / hard size ceilings"]
GIT["<b>Git history</b><br/>per-section last-touched mtime"]
end
BUDGETS -. canonical limits .-> CONFIG
subgraph PIPELINE [" four-stage pipeline "]
PARSE["<b>Section parser</b><br/>split by H2/H3 headings"]
HEUR["<b>Heuristic shortlist</b><br/>size · code blocks · staleness · duplication"]
JUDGE["<b>LLM judge</b><br/>keep / move / unsure"]
PLAN["<b>Trim plan</b><br/>cards + breadcrumbs, dry-run by default"]
end
FILES & GIT --> PARSE
CONFIG -. limits + tracked files .-> PARSE
PARSE ==> HEUR ==> JUDGE ==> PLAN
GATEWAY["<b>LLM gateway</b><br/>any OpenAI-compatible endpoint"]
CACHE["<b>Verdict cache</b><br/>SHA256 of section body · local-only"]
JUDGE <==>|structured prompt| GATEWAY
JUDGE <-. re-validated on read .-> CACHE
subgraph OUTPUTS [" trim --apply outputs "]
CARDS["<b>Memory cards</b><br/>memory/cards/<slug>.md"]
CRUMBS["<b>Rewritten originals</b><br/>one-line breadcrumbs to cards"]
REVIEW["<b>Manual review</b><br/>unsure verdicts, never auto-applied"]
end
PLAN ==>|move| CARDS
CARDS ==>|cards written first| CRUMBS
PLAN -->|unsure| REVIEW
GUARD["<b>Safety boundary</b><br/>clean git required · atomic writes · slug traversal guard"]
GUARD -. gates all writes .-> OUTPUTS
classDef source fill:#eff6ff,stroke:#2563eb,color:#1e3a8a;
classDef process fill:#ecfdf5,stroke:#059669,color:#064e3b;
classDef state fill:#fff7ed,stroke:#ea580c,color:#7c2d12;
classDef sink fill:#f8fafc,stroke:#64748b,color:#334155;
classDef guard fill:#fee2e2,stroke:#ef4444,color:#7f1d1d;
class FILES,CONFIG,BUDGETS,GIT source;
class PARSE,HEUR,JUDGE,PLAN process;
class GATEWAY,CACHE state;
class CARDS,CRUMBS,REVIEW sink;
class GUARD guard;
bootstrap-doctor runs a four-stage pipeline.
- Section parser. Splits each tracked
.mdby H2/H3 headings into(file, heading_path, body, char_count, last_touched_git_mtime)tuples. - Heuristic shortlist. Flags sections that look offload-worthy: body > 400 chars, contains a code block > 10 lines, no git touch in 60+ days, or duplicated across multiple tracked files.
- LLM judge. For each shortlisted section, POSTs to an OpenAI-compatible chat-completions endpoint (default
http://localhost:11434, Ollama) with a structured prompt asking whether the section is must-stay-loaded (active rules, identity, currently-relevant state) or reference-detail (historical, exemplar, one-off setup). Verdict is one ofkeep,move, orunsure. Token budget capped per run; verdicts cached by SHA256 of section body in~/.cache/bootstrap-doctor/verdicts.json. Any OpenAI-compatible endpoint works (Ollama, OpenAI, vLLM, etc.); setgateway_urlandgateway_modelin config. - Trim plan. For each
moveverdict, writes a card tomemory/cards/<slug>.mdwith the existing frontmatter convention, replaces the section in the original with a one-line breadcrumb pointing at the card.keepis a no-op.unsureis reported but never auto-applied.
Safety
- Dry-run by default.
--applyrequired for any write. - Atomic writes: temp file plus rename, so a torn write cannot leave a half-rewritten bootstrap file.
- Path-traversal guard on card slugs (must resolve inside
cards_dir). - Refuses to run if
git statusin the workspace is dirty, so any change is revertable. Ifcards_dirlives in a separate git repo, that repo must be clean too. Override with--force. - Verdict cache is local-only (
~/.cache/bootstrap-doctor/verdicts.json). Clear with--no-cache. - Card-write failures abort before bootstrap files are rewritten, so a failed run cannot leave breadcrumbs pointing at missing cards.
Development
python3 -m pip install -e ".[dev]"
pytest -q
python3 -m ruff check .
python3 -m mypy src/bootstrap_doctor
python3 -m build
pip-audit . --skip-editable
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bootstrap_doctor-0.2.0.tar.gz.
File metadata
- Download URL: bootstrap_doctor-0.2.0.tar.gz
- Upload date:
- Size: 82.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4e26ed20b70713029435d17cc7723fd37faf50a3e5e02487f2b95ce9e30cd3d
|
|
| MD5 |
2d28d42e4f05b03709810aa0c623985b
|
|
| BLAKE2b-256 |
9441577f5957c76d6dfa656e41a4659799e085c0fa307368b1718d1cac51d628
|
File details
Details for the file bootstrap_doctor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: bootstrap_doctor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 47.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bceb5a4cb09e7c403ab86958f7af90e49c306a3f24943dd1d3723ab5eddf4562
|
|
| MD5 |
8989376f36c4b013ca6f3b546a3dbccf
|
|
| BLAKE2b-256 |
2f8ce166ca3c1688b31fa77e793a7f48393392b7b31e6f43401739bca1bc88b6
|