Skip to main content

Sexpr-native MCP server giving Claude Code and other MCP agents white-box workflow orchestration, durable corpus, auto-crystallization.

Project description

NeuroLisp

A white-box MCP server for Claude Code agent workflows. Survive a kill -9 mid-task. Read what your subagent did in plain text. Replay any step without rerunning the rest.

tests python license mcp


Why does this exist

Claude Code is powerful in one session. The 4 things that break in production are paired with what NeuroLisp does about each:

The pain NeuroLisp's answer
You crash 8 hours in and lose all state. Every step's cursor, prompt, and result lives in sqlite. Reconnect and nl_workflow_replay.
You cannot audit what your subagent did. LangGraph objects inspect to <Node 0x...>. Every step is sexpr text in workflow_runs.steps_sexpr. Readable with cat.
Your patterns repeat but the agent forgets each time. After 3 consistent observations, the pattern auto-crystallizes into a reusable named skill.
You want to fix one step without rerunning everything. nl_workflow_patch_step <id> <step> <new-prompt>, then replay from that step only.

The file you executed is the file you patch. The same sexpr is workflow, plan, template, and macroexpansion result. That is homoiconicity, and the value is concrete: you can read, diff, edit, and replay without ever leaving plain text.


Why smaller models can do bigger work

CLAUDE.md and similar instruction files are an interpretive runtime. Every step, the model re-reads your rules, decides which skill to load, recalls past context, and improvises what to do next. That requires a very capable model. The cost grows with workflow complexity.

NeuroLisp's sexpr workflow is a deterministic runtime. Step order, tool whitelists, lexical scope, retry guards, pre-loaded skills, the briefing kit — all encoded in the sexpr before the model is invoked. The model only does the leaf work: draft this paragraph, classify this review, summarize these sources. The orchestration is the program, not the prompt.

CLAUDE.md interpretation                NeuroLisp deterministic orchestration
─────────────────────────                ────────────────────────────────────
Big model reads instructions             Small model receives a fully-formed
Big model decides next step              briefing for ONE leaf task
Big model loads context                  Workflow already loaded the context
Big model picks the tool                 Workflow already locked the tools
Big model improvises                     Sexpr executes deterministically
Cost scales with model + complexity      Cost scales with leaf-task count only

Practical effect: a 5-step essay pipeline that needs a top-tier model to coordinate via instruction-file interpretation runs on a smaller, cheaper model under NeuroLisp because the coordination is in the sexpr. Same output quality, often two orders of magnitude cheaper per LLM call:

DeepSeek v4-flash    $0.07 / 1M input tokens     $0.27 / 1M output tokens
Claude Opus 4.x      $15   / 1M input tokens     $75   / 1M output tokens
                     ~214× input                  ~278× output

Public list prices as of 2026-05. The 5-step essay benchmark in this repo lands at ~$0.003 per essay on DeepSeek v4-flash; the same workflow under a top-tier model would be in the $0.50 - $1.00 range.

This compounds. As you accumulate dozens of reusable pipelines and skills, you have a personal orchestration layer that any cheap model can drive. The intelligence migrates from the model into your workflow library, where it is inspectable, diff-able, and version-controlled.


In 30 seconds

Run a real workflow. One macro expands into 5 LLM steps: planner → 2 parallel researchers → 2 writers → reflect-revise editor.

(essay-atom-pipeline-scoped "GraphQL vs REST in 2026" "research-team")
;; → 4000+ word essay, ~$0.002-0.005 on DeepSeek v4-flash

Inspect what ran. Workflow + steps are sqlite rows, not opaque objects.

sqlite3 ~/.neurolisp_mcp.sqlite \
  "SELECT id, status, cursor, length(steps_sexpr) FROM workflow_runs ORDER BY id DESC LIMIT 1"
# wf-9b2e... | complete | 5/5 | 842 chars of plain sexpr

Survive a crash. Pull the plug at step 4 of 5. Reconnect Claude Code and ask it to:

nl_workflow_replay(workflow_id="wf-9b2e...", from_step="draft-section-2")
# resumes at draft-section-2; outline + research-1 + research-2 reuse
# cached results from sqlite, cost: $0 for the first 3 steps

Hand-edit a step then replay.

nl_workflow_patch_step(workflow_id="wf-9b2e...",
                       step_name="reflect-revise",
                       new_prompt="Reflect using 7 quality dimensions, then revise.")
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="reflect-revise")

No framework rerun. No LangGraph rebuild. Patch the sexpr and go.


How a complex workflow actually runs

Take the essay-atom-pipeline from the example above. It is intentionally complex enough to exercise every NeuroLisp mechanism: :auto T (server-side LLM), :tools deferred subagents (brain-side dispatch), a parallel group, a :retry-validate quality guard, a :sink length backstop, and :scope lexical-tool whitelisting through a NodeProfile. After the one-line macro call, here is what actually happens.

One user request, five MCP boundaries

                         USER
                           │  "Write me an essay on GraphQL vs REST."
                           ▼
                  ┌──────────────────┐
                  │   Claude Code    │  main agent in user's terminal
                  │   (the brain)    │  decides to use NeuroLisp
                  └────────┬─────────┘
                           │
                           │  nl_eval_sexpr(
                           │    '(essay-atom-pipeline-scoped
                           │      "GraphQL vs REST in 2026"
                           │      "research-team")')
                           ▼
                  ┌──────────────────┐
                  │  NeuroLisp MCP   │  parses sexpr, expands macro,
                  │     server       │  walks workflow groups, persists
                  │   (Python)       │  state to ~/.neurolisp_mcp.sqlite
                  └─┬────────────┬───┘
                    │            │
       ┌────────────┘            └────────────┐
       │ for :auto T steps        for :tools  │
       │ server calls LLM         server      │
       │ provider directly        hands back  │
       ▼                          deferred    ▼
┌─────────────────┐               token   ┌────────────────┐
│ LLM provider    │                       │ Claude Code    │
│ DeepSeek /      │◀──────────────────────│ Agent tool     │
│ OpenAI-compat / │   HTTP request        │ dispatches a   │
│ Anthropic       │   includes briefing   │ fresh subagent │
│   (urllib only) │                       │ with the       │
└────────┬────────┘                       │ briefing kit   │
         │ text response                  └───────┬────────┘
         ▼                                        │ subagent
   apply :sink                                    │ runs WebSearch
   apply :retry-validate                          │ + WebFetch
   write to wf.results                            │ + reasoning
   write to corpus row in sqlite                  ▼
         │                                  result text
         │              ┌─────────────────────────┘
         │              │ nl_resolve_subagent(token, result)
         ▼              ▼
              workflow advances cursor
              next group fires
              auto-chain runs all :auto steps in one server call
                       │
                       ▼
              workflow status = complete
              final result in wf.results["reflect-revise"]
                       │
                       ▼
            ┌──────────────────┐
            │   USER reads     │
            │   the essay      │
            └──────────────────┘

Phase-by-phase trace

# Phase Who acts What actually happens
1 Macro expansion Server (essay-atom-pipeline-scoped ...) → 30-line (workflow (quote ...) (quote (5 steps))) AST in memory. No LLM yet.
2 Group 0: outline Server + DeepSeek :auto T planner step. Server builds briefing from essay-outline-architect skill + topic, HTTP-POSTs DeepSeek, gets 400-word outline, applies sink (none), writes wf.results["outline"] + corpus row auto-step:planner.
3 Group 1: research × 2 (parallel + deferred) Brain → 2 subagents :tools (WebSearch WebFetch) steps emit 2 deferred tokens. Brain receives parallel_steps payload, dispatches 2 Claude Code subagents via the Agent tool, each with its own briefing kit (Role / Task / Upstream Artifacts / SOP / Tools Available). Subagents call WebSearch + WebFetch independently. Brain receives 2 result strings, calls nl_resolve_subagent(token, result) twice.
4 Auto chain: groups 2-4 Server + DeepSeek After the 2nd resolve, server sees the next 3 groups (draft-section-1, draft-section-2, reflect-revise) are all :auto T. It runs them back-to-back in a single server-side loop (auto chain, v7.62), no brain round-trips. Each step's prompt references upstream step names which the env resolves to actual text.
5 Final guard Server reflect-revise has :sink (cond ((< (string-length result) 2000) (str "WARNING short essay..." result)) (T result)). If LLM truncates, the sink prepends a WARNING header before storing. wf.summaries["reflect-revise"] also stored for downstream brevity.
6 Return Server → Brain → User Server returns {complete: true, results: {6 step keys}}. Claude Code reads results["reflect-revise"] and shows the essay to the user.

The boundary each layer enforces

  • The brain (Claude Code) decides what to ask for and whom to dispatch (subagents). It never decides how a step is written — the workflow grammar already encodes that.
  • The server (NeuroLisp) decides how each step executes (auto vs deferred), what context that step sees (briefing kit), and what state persists (sqlite). It never decides whom to dispatch — invariant 3.
  • The LLM provider does one leaf task at a time. Briefing is fully assembled before the HTTP call, so the model is not asked to plan, only to produce.
  • The subagent is a one-shot Claude Code instance: opens a fresh context, runs the tools listed in its briefing, returns one result string. It has no awareness of the broader workflow.

That separation is what lets a small, cheap leaf-model do the same end-to-end work that a single big-model session would otherwise need: every step is pre-decided in the sexpr, so the model is never asked to be smart about the plan.

What sqlite actually contains after one run

sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT primitive, success, length(output) AS out_chars, cost
  FROM corpus
  ORDER BY row_id DESC LIMIT 6"
invoke-subagent:editor            | 1 | 4149 | 0.00097     -- reflect-revise (:auto T)
invoke-subagent:writer            | 1 | 2104 | 0.00031     -- draft-section-2 (:auto T)
invoke-subagent:writer            | 1 | 2087 | 0.00029     -- draft-section-1 (:auto T)
invoke-subagent:general-purpose   | 1 | 1856 | 0.0         -- research-2 (:tools, brain-side)
invoke-subagent:general-purpose   | 1 | 1734 | 0.0         -- research-1 (:tools, brain-side)
invoke-subagent:planner           | 1 |  412 | 0.00018     -- outline (:auto T)

Every step writes a row with primitive prefix invoke-subagent:<agent> regardless of execution path. :auto T rows carry the server-side LLM cost; :tools rows show 0.0 because the cost lives on the brain side (subagent dispatch). :pure T steps use a different prefix pure-step:<agent>.

sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT id, status, cursor, length(steps_sexpr) AS plan_chars
  FROM workflow_runs ORDER BY rowid DESC LIMIT 1"
wf-9b2e... | complete | 5/5 | 842

6 corpus rows are append-only audit trail; the workflow_runs row is the resumable snapshot. Both are plain SQL. Both are diff-able. Nothing about this run is opaque.


Get started

Install from source (works today):

git clone https://github.com/KevinBangbang/NeuroLisp.git
cd NeuroLisp
pip install -e .
python -m neurolisp.health

A PyPI release (pip install neurolisp) is staged for v0.35.0 and will be available shortly after the first public release tag.

Wire into Claude Code by editing ~/.claude.json:

{
  "mcpServers": {
    "neurolisp": {
      "command": "python",
      "args": ["-m", "mcp_server.server"]
    }
  }
}

Restart Claude Code, run /mcp. All 45 nl_* tools appear.

For real LLM steps, export an API key:

export DEEPSEEK_API_KEY=sk-...
# or OPENAI_API_KEY / ANTHROPIC_API_KEY

OpenAI-compatible endpoints (Groq, Together, Cerebras, local vLLM, etc.) are supported by swapping base_url. See docs/00_quickstart.md for the 10-minute walkthrough.


What you can do

  • Resume any workflow at the exact step after a crash, kernel panic, or kill -9. Step results live in sqlite, not RAM.
  • Audit every subagent invocation by cat-ing a sexpr. The execution plan is the same file you can hand-edit.
  • Skip steps you have already paid for with nl_workflow_replay(workflow_id, from_step="step-name"). Cached upstream results stay valid.
  • Patch one step then replay. nl_workflow_patch_step updates the plan; the next replay picks up the change.
  • Crystallize a repeated pattern into a named skill after 3 consistent observations. Reuse it from any future workflow.
  • Skip the heavyweight 5-step pipeline for low-stakes tasks with stakes-route. Empirically 89% cheaper than the full pipeline (v0.34 routine vs full bench).
  • Swap LLM providers without changing the workflow. DeepSeek, Anthropic, and any OpenAI-compatible endpoint (Groq, Together, Cerebras, local vLLM) all on stdlib urllib, no SDK lock-in.
  • Trust the test surface. 1992 passing tests + 4 skipped on every commit across Ubuntu and Windows on Python 3.10 / 3.11 / 3.12.

Documentation

If you want to... Read this
Try it in 10 minutes docs/00_quickstart.md
Understand why each layer exists docs/01_concepts/00_first_principles.md
Follow a build-up tutorial docs/07_tutorial-book/ 17 chapters
Browse all 45 MCP tools docs/02_reference/mcp-tools.md
Read the 8 invariants and anti-goals NORTH_STAR.md
See empirical benchmarks docs/BENCHMARKS.md
Run example scripts locally examples/ 3 standalone demos
Browse version history CHANGELOG.md

Community

This is a 1-maintainer project. Realistic response time: a few days for bug reports with reproducers, longer for open-ended discussions.


Contributing

NeuroLisp is small by design. We review every PR with Occam's razor. New atoms, modules, or workflows must pass real-LLM end-to-end A/B validation before landing on main. See CONTRIBUTING.md for setup, conventions, and the in-scope / out-of-scope list.


License

Apache 2.0. Copyright Bangcheng Wang and NeuroLisp contributors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurolisp-0.35.0.tar.gz (359.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurolisp-0.35.0-py3-none-any.whl (200.2 kB view details)

Uploaded Python 3

File details

Details for the file neurolisp-0.35.0.tar.gz.

File metadata

  • Download URL: neurolisp-0.35.0.tar.gz
  • Upload date:
  • Size: 359.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for neurolisp-0.35.0.tar.gz
Algorithm Hash digest
SHA256 caa6135efa7f74b659eb1538b8f10b67ff191fd2300a9b19b765da0cd47478c3
MD5 647cb096d2f347436a4eb3ea71d1e9b6
BLAKE2b-256 8c42a10a42c735e4022909873d8cae6e75bd9fe6f4f895f3f4d14a9e78f61dcf

See more details on using hashes here.

File details

Details for the file neurolisp-0.35.0-py3-none-any.whl.

File metadata

  • Download URL: neurolisp-0.35.0-py3-none-any.whl
  • Upload date:
  • Size: 200.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for neurolisp-0.35.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c822645d66d07472cb04c137ca4939451afcb15de76a2cc444d52f250e1cf5e5
MD5 e0574bcfd6d8c7857875b997cd14de82
BLAKE2b-256 ac542f1c6a3e2097fc0897dbb4e40cda6a8680c43e406c8371a4a1c35a923f11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page