Sexpr-native MCP server giving Claude Code and other MCP agents white-box workflow orchestration, durable corpus, auto-crystallization.
Project description
NeuroLisp
A white-box MCP server for Claude Code agent workflows. Survive a kill -9 mid-task. Read what your subagent did in plain text. Replay any step without rerunning the rest.
Why does this exist
Claude Code is powerful in one session. The 4 things that break in production are paired with what NeuroLisp does about each:
| The pain | NeuroLisp's answer |
|---|---|
| You crash 8 hours in and lose all state. | Every step's cursor, prompt, and result lives in sqlite. Reconnect and nl_workflow_replay. |
You cannot audit what your subagent did. LangGraph objects inspect to <Node 0x...>. |
Every step is sexpr text in workflow_runs.steps_sexpr. Readable with cat. |
| Your patterns repeat but the agent forgets each time. | After 3 consistent observations, the pattern auto-crystallizes into a reusable named skill. |
| You want to fix one step without rerunning everything. | nl_workflow_patch_step <id> <step> <new-prompt>, then replay from that step only. |
The file you executed is the file you patch. The same sexpr is workflow, plan, template, and macroexpansion result. That is homoiconicity, and the value is concrete: you can read, diff, edit, and replay without ever leaving plain text.
Why smaller models can do bigger work
CLAUDE.md and similar instruction files are an interpretive runtime. Every step, the model re-reads your rules, decides which skill to load, recalls past context, and improvises what to do next. That requires a very capable model. The cost grows with workflow complexity.
NeuroLisp's sexpr workflow is a deterministic runtime. Step order, tool whitelists, lexical scope, retry guards, pre-loaded skills, the briefing kit — all encoded in the sexpr before the model is invoked. The model only does the leaf work: draft this paragraph, classify this review, summarize these sources. The orchestration is the program, not the prompt.
CLAUDE.md interpretation NeuroLisp deterministic orchestration
───────────────────────── ────────────────────────────────────
Big model reads instructions Small model receives a fully-formed
Big model decides next step briefing for ONE leaf task
Big model loads context Workflow already loaded the context
Big model picks the tool Workflow already locked the tools
Big model improvises Sexpr executes deterministically
Cost scales with model + complexity Cost scales with leaf-task count only
Practical effect: a 5-step essay pipeline that needs a top-tier model to coordinate via instruction-file interpretation runs on a smaller, cheaper model under NeuroLisp because the coordination is in the sexpr. Same output quality, often two orders of magnitude cheaper per LLM call:
DeepSeek v4-flash $0.07 / 1M input tokens $0.27 / 1M output tokens
Claude Opus 4.x $15 / 1M input tokens $75 / 1M output tokens
~214× input ~278× output
Public list prices as of 2026-05. The 5-step essay benchmark in this repo lands at ~$0.003 per essay on DeepSeek v4-flash; the same workflow under a top-tier model would be in the $0.50 - $1.00 range.
This compounds. As you accumulate dozens of reusable pipelines and skills, you have a personal orchestration layer that any cheap model can drive. The intelligence migrates from the model into your workflow library, where it is inspectable, diff-able, and version-controlled.
In 30 seconds
Run a real workflow. One macro expands into 5 LLM steps: planner → 2 parallel researchers → 2 writers → reflect-revise editor.
(essay-atom-pipeline-scoped "GraphQL vs REST in 2026" "research-team")
;; → 4000+ word essay, ~$0.002-0.005 on DeepSeek v4-flash
Inspect what ran. Workflow + steps are sqlite rows, not opaque objects.
sqlite3 ~/.neurolisp_mcp.sqlite \
"SELECT id, status, cursor, length(steps_sexpr) FROM workflow_runs ORDER BY id DESC LIMIT 1"
# wf-9b2e... | complete | 5/5 | 842 chars of plain sexpr
Survive a crash. Pull the plug at step 4 of 5. Reconnect Claude Code and ask it to:
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="draft-section-2")
# resumes at draft-section-2; outline + research-1 + research-2 reuse
# cached results from sqlite, cost: $0 for the first 3 steps
Hand-edit a step then replay.
nl_workflow_patch_step(workflow_id="wf-9b2e...",
step_name="reflect-revise",
new_prompt="Reflect using 7 quality dimensions, then revise.")
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="reflect-revise")
No framework rerun. No LangGraph rebuild. Patch the sexpr and go.
How a complex workflow actually runs
Take the essay-atom-pipeline from the example above. It is intentionally complex enough to exercise every NeuroLisp mechanism: :auto T (server-side LLM), :tools deferred subagents (brain-side dispatch), a parallel group, a :retry-validate quality guard, a :sink length backstop, and :scope lexical-tool whitelisting through a NodeProfile. After the one-line macro call, here is what actually happens.
One user request, five MCP boundaries
USER
│ "Write me an essay on GraphQL vs REST."
▼
┌──────────────────┐
│ Claude Code │ main agent in user's terminal
│ (the brain) │ decides to use NeuroLisp
└────────┬─────────┘
│
│ nl_eval_sexpr(
│ '(essay-atom-pipeline-scoped
│ "GraphQL vs REST in 2026"
│ "research-team")')
▼
┌──────────────────┐
│ NeuroLisp MCP │ parses sexpr, expands macro,
│ server │ walks workflow groups, persists
│ (Python) │ state to ~/.neurolisp_mcp.sqlite
└─┬────────────┬───┘
│ │
┌────────────┘ └────────────┐
│ for :auto T steps for :tools │
│ server calls LLM server │
│ provider directly hands back │
▼ deferred ▼
┌─────────────────┐ token ┌────────────────┐
│ LLM provider │ │ Claude Code │
│ DeepSeek / │◀──────────────────────│ Agent tool │
│ OpenAI-compat / │ HTTP request │ dispatches a │
│ Anthropic │ includes briefing │ fresh subagent │
│ (urllib only) │ │ with the │
└────────┬────────┘ │ briefing kit │
│ text response └───────┬────────┘
▼ │ subagent
apply :sink │ runs WebSearch
apply :retry-validate │ + WebFetch
write to wf.results │ + reasoning
write to corpus row in sqlite ▼
│ result text
│ ┌─────────────────────────┘
│ │ nl_resolve_subagent(token, result)
▼ ▼
workflow advances cursor
next group fires
auto-chain runs all :auto steps in one server call
│
▼
workflow status = complete
final result in wf.results["reflect-revise"]
│
▼
┌──────────────────┐
│ USER reads │
│ the essay │
└──────────────────┘
Phase-by-phase trace
| # | Phase | Who acts | What actually happens |
|---|---|---|---|
| 1 | Macro expansion | Server | (essay-atom-pipeline-scoped ...) → 30-line (workflow (quote ...) (quote (5 steps))) AST in memory. No LLM yet. |
| 2 | Group 0: outline | Server + DeepSeek | :auto T planner step. Server builds briefing from essay-outline-architect skill + topic, HTTP-POSTs DeepSeek, gets 400-word outline, applies sink (none), writes wf.results["outline"] + corpus row auto-step:planner. |
| 3 | Group 1: research × 2 (parallel + deferred) | Brain → 2 subagents | :tools (WebSearch WebFetch) steps emit 2 deferred tokens. Brain receives parallel_steps payload, dispatches 2 Claude Code subagents via the Agent tool, each with its own briefing kit (Role / Task / Upstream Artifacts / SOP / Tools Available). Subagents call WebSearch + WebFetch independently. Brain receives 2 result strings, calls nl_resolve_subagent(token, result) twice. |
| 4 | Auto chain: groups 2-4 | Server + DeepSeek | After the 2nd resolve, server sees the next 3 groups (draft-section-1, draft-section-2, reflect-revise) are all :auto T. It runs them back-to-back in a single server-side loop (auto chain, v7.62), no brain round-trips. Each step's prompt references upstream step names which the env resolves to actual text. |
| 5 | Final guard | Server | reflect-revise has :sink (cond ((< (string-length result) 2000) (str "WARNING short essay..." result)) (T result)). If LLM truncates, the sink prepends a WARNING header before storing. wf.summaries["reflect-revise"] also stored for downstream brevity. |
| 6 | Return | Server → Brain → User | Server returns {complete: true, results: {6 step keys}}. Claude Code reads results["reflect-revise"] and shows the essay to the user. |
The boundary each layer enforces
- The brain (Claude Code) decides what to ask for and whom to dispatch (subagents). It never decides how a step is written — the workflow grammar already encodes that.
- The server (NeuroLisp) decides how each step executes (auto vs deferred), what context that step sees (briefing kit), and what state persists (sqlite). It never decides whom to dispatch — invariant 3.
- The LLM provider does one leaf task at a time. Briefing is fully assembled before the HTTP call, so the model is not asked to plan, only to produce.
- The subagent is a one-shot Claude Code instance: opens a fresh context, runs the tools listed in its briefing, returns one result string. It has no awareness of the broader workflow.
That separation is what lets a small, cheap leaf-model do the same end-to-end work that a single big-model session would otherwise need: every step is pre-decided in the sexpr, so the model is never asked to be smart about the plan.
What sqlite actually contains after one run
sqlite3 ~/.neurolisp_mcp.sqlite "
SELECT primitive, success, length(output) AS out_chars, cost
FROM corpus
ORDER BY row_id DESC LIMIT 6"
invoke-subagent:editor | 1 | 4149 | 0.00097 -- reflect-revise (:auto T)
invoke-subagent:writer | 1 | 2104 | 0.00031 -- draft-section-2 (:auto T)
invoke-subagent:writer | 1 | 2087 | 0.00029 -- draft-section-1 (:auto T)
invoke-subagent:general-purpose | 1 | 1856 | 0.0 -- research-2 (:tools, brain-side)
invoke-subagent:general-purpose | 1 | 1734 | 0.0 -- research-1 (:tools, brain-side)
invoke-subagent:planner | 1 | 412 | 0.00018 -- outline (:auto T)
Every step writes a row with primitive prefix invoke-subagent:<agent> regardless of execution path. :auto T rows carry the server-side LLM cost; :tools rows show 0.0 because the cost lives on the brain side (subagent dispatch). :pure T steps use a different prefix pure-step:<agent>.
sqlite3 ~/.neurolisp_mcp.sqlite "
SELECT id, status, cursor, length(steps_sexpr) AS plan_chars
FROM workflow_runs ORDER BY rowid DESC LIMIT 1"
wf-9b2e... | complete | 5/5 | 842
6 corpus rows are append-only audit trail; the workflow_runs row is the resumable snapshot. Both are plain SQL. Both are diff-able. Nothing about this run is opaque.
Get started
Install from source (works today):
git clone https://github.com/KevinBangbang/NeuroLisp.git
cd NeuroLisp
pip install -e .
python -m neurolisp.health
A PyPI release (pip install neurolisp) is staged for v0.35.0 and will be available shortly after the first public release tag.
Wire into Claude Code by editing ~/.claude.json:
{
"mcpServers": {
"neurolisp": {
"command": "python",
"args": ["-m", "mcp_server.server"]
}
}
}
Restart Claude Code, run /mcp. All 45 nl_* tools appear.
For real LLM steps, export an API key:
export DEEPSEEK_API_KEY=sk-...
# or OPENAI_API_KEY / ANTHROPIC_API_KEY
OpenAI-compatible endpoints (Groq, Together, Cerebras, local vLLM, etc.) are supported by swapping base_url. See docs/00_quickstart.md for the 10-minute walkthrough.
What you can do
- Resume any workflow at the exact step after a crash, kernel panic, or
kill -9. Step results live in sqlite, not RAM. - Audit every subagent invocation by
cat-ing a sexpr. The execution plan is the same file you can hand-edit. - Skip steps you have already paid for with
nl_workflow_replay(workflow_id, from_step="step-name"). Cached upstream results stay valid. - Patch one step then replay.
nl_workflow_patch_stepupdates the plan; the next replay picks up the change. - Crystallize a repeated pattern into a named skill after 3 consistent observations. Reuse it from any future workflow.
- Skip the heavyweight 5-step pipeline for low-stakes tasks with
stakes-route. Empirically 89% cheaper than the full pipeline (v0.34 routine vs full bench). - Swap LLM providers without changing the workflow. DeepSeek, Anthropic, and any OpenAI-compatible endpoint (Groq, Together, Cerebras, local vLLM) all on stdlib
urllib, no SDK lock-in. - Trust the test surface. 1992 passing tests + 4 skipped on every commit across Ubuntu and Windows on Python 3.10 / 3.11 / 3.12.
Documentation
| If you want to... | Read this |
|---|---|
| Try it in 10 minutes | docs/00_quickstart.md |
| Understand why each layer exists | docs/01_concepts/00_first_principles.md |
| Follow a build-up tutorial | docs/07_tutorial-book/ 17 chapters |
| Browse all 45 MCP tools | docs/02_reference/mcp-tools.md |
| Read the 8 invariants and anti-goals | NORTH_STAR.md |
| See empirical benchmarks | docs/BENCHMARKS.md |
| Run example scripts locally | examples/ 3 standalone demos |
| Browse version history | CHANGELOG.md |
Community
- GitHub Discussions for design questions, show-and-tell, RFCs
- GitHub Issues for bug reports and feature requests
- Security disclosures:
SECURITY.md
This is a 1-maintainer project. Realistic response time: a few days for bug reports with reproducers, longer for open-ended discussions.
Contributing
NeuroLisp is small by design. We review every PR with Occam's razor. New atoms, modules, or workflows must pass real-LLM end-to-end A/B validation before landing on main. See CONTRIBUTING.md for setup, conventions, and the in-scope / out-of-scope list.
License
Apache 2.0. Copyright Bangcheng Wang and NeuroLisp contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neurolisp-0.35.0.tar.gz.
File metadata
- Download URL: neurolisp-0.35.0.tar.gz
- Upload date:
- Size: 359.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caa6135efa7f74b659eb1538b8f10b67ff191fd2300a9b19b765da0cd47478c3
|
|
| MD5 |
647cb096d2f347436a4eb3ea71d1e9b6
|
|
| BLAKE2b-256 |
8c42a10a42c735e4022909873d8cae6e75bd9fe6f4f895f3f4d14a9e78f61dcf
|
File details
Details for the file neurolisp-0.35.0-py3-none-any.whl.
File metadata
- Download URL: neurolisp-0.35.0-py3-none-any.whl
- Upload date:
- Size: 200.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c822645d66d07472cb04c137ca4939451afcb15de76a2cc444d52f250e1cf5e5
|
|
| MD5 |
e0574bcfd6d8c7857875b997cd14de82
|
|
| BLAKE2b-256 |
ac542f1c6a3e2097fc0897dbb4e40cda6a8680c43e406c8371a4a1c35a923f11
|