Sexpr-native MCP server giving Claude Code and other MCP agents white-box workflow orchestration, durable corpus, auto-crystallization.

These details have not been verified by PyPI

Project description

NeuroLisp

A white-box MCP server for Claude Code agent workflows. Survive a kill -9 mid-task. Read what your subagent did in plain text. Replay any step without rerunning the rest.

CLAUDE.md interpretive runtime vs NeuroLisp deterministic orchestration

Why does this exist

Claude Code is powerful in one session. The 4 things that break in production are paired with what NeuroLisp does about each:

The pain	NeuroLisp's answer
You crash 8 hours in and lose all state.	Every step's cursor, prompt, and result lives in sqlite. Reconnect and `nl_workflow_replay`.
You cannot audit what your subagent did. LangGraph objects inspect to `<Node 0x...>`.	Every step is sexpr text in `workflow_runs.steps_sexpr`. Readable with `cat`.
Your patterns repeat but the agent forgets each time.	After 3 consistent observations, the pattern auto-crystallizes into a reusable named skill.
You want to fix one step without rerunning everything.	`nl_workflow_patch_step <id> <step> <new-prompt>`, then replay from that step only.

The file you executed is the file you patch. The same sexpr is workflow, plan, template, and macroexpansion result. That is homoiconicity, and the value is concrete: you can read, diff, edit, and replay without ever leaving plain text.

Why smaller models can do bigger work

CLAUDE.md and similar instruction files are an interpretive runtime. Every step, the model re-reads your rules, decides which skill to load, recalls past context, and improvises what to do next. That requires a very capable model. The cost grows with workflow complexity.

NeuroLisp's sexpr workflow is a deterministic runtime. Step order, tool whitelists, lexical scope, retry guards, pre-loaded skills, the briefing kit — all encoded in the sexpr before the model is invoked. The model only does the leaf work: draft this paragraph, classify this review, summarize these sources. The orchestration is the program, not the prompt.

CLAUDE.md interpretation                NeuroLisp deterministic orchestration
─────────────────────────                ────────────────────────────────────
Big model reads instructions             Small model receives a fully-formed
Big model decides next step              briefing for ONE leaf task
Big model loads context                  Workflow already loaded the context
Big model picks the tool                 Workflow already locked the tools
Big model improvises                     Sexpr executes deterministically
Cost scales with model + complexity      Cost scales with leaf-task count only

Practical effect: a 5-step essay pipeline that needs a top-tier model to coordinate via instruction-file interpretation runs on a smaller, cheaper model under NeuroLisp because the coordination is in the sexpr. Same output quality, often two orders of magnitude cheaper per LLM call:

DeepSeek v4-flash    $0.07 / 1M input tokens     $0.27 / 1M output tokens
Claude Opus 4.x      $15   / 1M input tokens     $75   / 1M output tokens
                     ~214× input                  ~278× output

Public list prices as of 2026-05. The 5-step essay benchmark in this repo lands at ~$0.003 per essay on DeepSeek v4-flash; the same workflow under a top-tier model would be in the $0.50 - $1.00 range.

This compounds. As you accumulate dozens of reusable pipelines and skills, you have a personal orchestration layer that any cheap model can drive. The intelligence migrates from the model into your workflow library, where it is inspectable, diff-able, and version-controlled.

In 30 seconds

Run a real workflow. One macro expands into 5 LLM steps: planner → 2 parallel researchers → 2 writers → reflect-revise editor.

(essay-atom-pipeline-scoped "GraphQL vs REST in 2026" "research-team")
;; → 4000+ word essay, ~$0.002-0.005 on DeepSeek v4-flash

Inspect what ran. Workflow + steps are sqlite rows, not opaque objects.

sqlite3 ~/.neurolisp_mcp.sqlite \
  "SELECT id, status, cursor, length(steps_sexpr) FROM workflow_runs ORDER BY id DESC LIMIT 1"
# wf-9b2e... | complete | 5/5 | 842 chars of plain sexpr

Survive a crash. Pull the plug at step 4 of 5. Reconnect Claude Code and ask it to:

nl_workflow_replay(workflow_id="wf-9b2e...", from_step="draft-section-2")
# resumes at draft-section-2; outline + research-1 + research-2 reuse
# cached results from sqlite, cost: $0 for the first 3 steps

Hand-edit a step then replay.

nl_workflow_patch_step(workflow_id="wf-9b2e...",
                       step_name="reflect-revise",
                       new_prompt="Reflect using 7 quality dimensions, then revise.")
nl_workflow_replay(workflow_id="wf-9b2e...", from_step="reflect-revise")

No framework rerun. No LangGraph rebuild. Patch the sexpr and go.

How a complex workflow actually runs

Take the essay-atom-pipeline from the example above. It is intentionally complex enough to exercise every NeuroLisp mechanism: :auto T (server-side LLM), :tools deferred subagents (brain-side dispatch), a parallel group, a :retry-validate quality guard, a :sink length backstop, and :scope lexical-tool whitelisting through a NodeProfile. After the one-line macro call, here is what actually happens.

One user request, five MCP boundaries

                         USER
                           │  "Write me an essay on GraphQL vs REST."
                           ▼
                  ┌──────────────────┐
                  │   Claude Code    │  main agent in user's terminal
                  │   (the brain)    │  decides to use NeuroLisp
                  └────────┬─────────┘
                           │
                           │  nl_eval_sexpr(
                           │    '(essay-atom-pipeline-scoped
                           │      "GraphQL vs REST in 2026"
                           │      "research-team")')
                           ▼
                  ┌──────────────────┐
                  │  NeuroLisp MCP   │  parses sexpr, expands macro,
                  │     server       │  walks workflow groups, persists
                  │   (Python)       │  state to ~/.neurolisp_mcp.sqlite
                  └─┬────────────┬───┘
                    │            │
       ┌────────────┘            └────────────┐
       │ for :auto T steps        for :tools  │
       │ server calls LLM         server      │
       │ provider directly        hands back  │
       ▼                          deferred    ▼
┌─────────────────┐               token   ┌────────────────┐
│ LLM provider    │                       │ Claude Code    │
│ DeepSeek /      │◀──────────────────────│ Agent tool     │
│ OpenAI-compat / │   HTTP request        │ dispatches a   │
│ Anthropic       │   includes briefing   │ fresh subagent │
│   (urllib only) │                       │ with the       │
└────────┬────────┘                       │ briefing kit   │
         │ text response                  └───────┬────────┘
         ▼                                        │ subagent
   apply :sink                                    │ runs WebSearch
   apply :retry-validate                          │ + WebFetch
   write to wf.results                            │ + reasoning
   write to corpus row in sqlite                  ▼
         │                                  result text
         │              ┌─────────────────────────┘
         │              │ nl_resolve_subagent(token, result)
         ▼              ▼
              workflow advances cursor
              next group fires
              auto-chain runs all :auto steps in one server call
                       │
                       ▼
              workflow status = complete
              final result in wf.results["reflect-revise"]
                       │
                       ▼
            ┌──────────────────┐
            │   USER reads     │
            │   the essay      │
            └──────────────────┘

Phase-by-phase trace

#	Phase	Who acts	What actually happens
1	Macro expansion	Server	`(essay-atom-pipeline-scoped ...)` → 30-line `(workflow (quote ...) (quote (5 steps)))` AST in memory. No LLM yet.
2	Group 0: outline	Server + DeepSeek	`:auto T` planner step. Server builds briefing from `essay-outline-architect` skill + topic, HTTP-POSTs DeepSeek, gets 400-word outline, applies sink (none), writes `wf.results["outline"]` + corpus row `auto-step:planner`.
3	Group 1: research × 2 (parallel + deferred)	Brain → 2 subagents	`:tools (WebSearch WebFetch)` steps emit 2 deferred tokens. Brain receives `parallel_steps` payload, dispatches 2 Claude Code subagents via the `Agent` tool, each with its own briefing kit (Role / Task / Upstream Artifacts / SOP / Tools Available). Subagents call WebSearch + WebFetch independently. Brain receives 2 result strings, calls `nl_resolve_subagent(token, result)` twice.
4	Auto chain: groups 2-4	Server + DeepSeek	After the 2nd resolve, server sees the next 3 groups (draft-section-1, draft-section-2, reflect-revise) are all `:auto T`. It runs them back-to-back in a single server-side loop (`auto chain`, v7.62), no brain round-trips. Each step's prompt references upstream step names which the env resolves to actual text.
5	Final guard	Server	reflect-revise has `:sink (cond ((< (string-length result) 2000) (str "WARNING short essay..." result)) (T result))`. If LLM truncates, the sink prepends a WARNING header before storing. `wf.summaries["reflect-revise"]` also stored for downstream brevity.
6	Return	Server → Brain → User	Server returns `{complete: true, results: {6 step keys}}`. Claude Code reads `results["reflect-revise"]` and shows the essay to the user.

The boundary each layer enforces

The brain (Claude Code) decides what to ask for and whom to dispatch (subagents). It never decides how a step is written — the workflow grammar already encodes that.
The server (NeuroLisp) decides how each step executes (auto vs deferred), what context that step sees (briefing kit), and what state persists (sqlite). It never decides whom to dispatch — invariant 3.
The LLM provider does one leaf task at a time. Briefing is fully assembled before the HTTP call, so the model is not asked to plan, only to produce.
The subagent is a one-shot Claude Code instance: opens a fresh context, runs the tools listed in its briefing, returns one result string. It has no awareness of the broader workflow.

That separation is what lets a small, cheap leaf-model do the same end-to-end work that a single big-model session would otherwise need: every step is pre-decided in the sexpr, so the model is never asked to be smart about the plan.

What sqlite actually contains after one run

sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT primitive, success, length(output) AS out_chars, cost
  FROM corpus
  ORDER BY row_id DESC LIMIT 6"

invoke-subagent:editor            | 1 | 4149 | 0.00097     -- reflect-revise (:auto T)
invoke-subagent:writer            | 1 | 2104 | 0.00031     -- draft-section-2 (:auto T)
invoke-subagent:writer            | 1 | 2087 | 0.00029     -- draft-section-1 (:auto T)
invoke-subagent:general-purpose   | 1 | 1856 | 0.0         -- research-2 (:tools, brain-side)
invoke-subagent:general-purpose   | 1 | 1734 | 0.0         -- research-1 (:tools, brain-side)
invoke-subagent:planner           | 1 |  412 | 0.00018     -- outline (:auto T)

Every step writes a row with primitive prefix invoke-subagent:<agent> regardless of execution path. :auto T rows carry the server-side LLM cost; :tools rows show 0.0 because the cost lives on the brain side (subagent dispatch). :pure T steps use a different prefix pure-step:<agent>.

sqlite3 ~/.neurolisp_mcp.sqlite "
  SELECT id, status, cursor, length(steps_sexpr) AS plan_chars
  FROM workflow_runs ORDER BY rowid DESC LIMIT 1"

wf-9b2e... | complete | 5/5 | 842

6 corpus rows are append-only audit trail; the workflow_runs row is the resumable snapshot. Both are plain SQL. Both are diff-able. Nothing about this run is opaque.

Get started

pip install neurolisp
python -m neurolisp.health

Or install from source (for contributors):

git clone https://github.com/KevinBangbang/NeuroLisp.git
cd NeuroLisp
pip install -e ".[dev]"
python -m pytest -q

Wire into Claude Code by editing ~/.claude.json:

{
  "mcpServers": {
    "neurolisp": {
      "command": "python",
      "args": ["-m", "mcp_server.server"]
    }
  }
}

Restart Claude Code, run /mcp. All 57 nl_* tools appear.

For real LLM steps, export an API key:

export DEEPSEEK_API_KEY=sk-...
# or OPENAI_API_KEY / ANTHROPIC_API_KEY

OpenAI-compatible endpoints (Groq, Together, Cerebras, local vLLM, etc.) are supported by swapping base_url. See docs/00_quickstart.md for the 10-minute walkthrough.

What you can do

Resume any workflow at the exact step after a crash, kernel panic, or kill -9. Step results live in sqlite, not RAM.
Audit every subagent invocation by cat-ing a sexpr. The execution plan is the same file you can hand-edit.
Skip steps you have already paid for with nl_workflow_replay(workflow_id, from_step="step-name"). Cached upstream results stay valid.
Patch one step then replay. nl_workflow_patch_step updates the plan; the next replay picks up the change.
Crystallize a repeated pattern into a named skill after 3 consistent observations. Reuse it from any future workflow.
Skip the heavyweight 5-step pipeline for low-stakes tasks with stakes-route. Empirically 89% cheaper than the full pipeline (v0.34 routine vs full bench).
Swap LLM providers without changing the workflow. DeepSeek, Anthropic, and any OpenAI-compatible endpoint (Groq, Together, Cerebras, local vLLM) all on stdlib urllib, no SDK lock-in.
Audit the full workflow trail. v0.36 adds deterministic EvalSpec gates, HITL approvals, trace spans, artifact references, and offline checkpoint DAG visuals.
Trust the test surface. 2047 passing tests + 4 skipped on every commit across Ubuntu and Windows on Python 3.10 / 3.11 / 3.12.

Documentation

If you want to...	Read this
Try it in 10 minutes	`docs/00_quickstart.md`
Understand why each layer exists	`docs/01_concepts/00_first_principles.md`
Follow a build-up tutorial	`docs/07_tutorial-book/` 17 chapters
Browse all 57 MCP tools	`docs/02_reference/mcp-tools.md`
Read the 8 invariants and anti-goals	`NORTH_STAR.md`
See empirical benchmarks	`docs/BENCHMARKS.md`
Run example scripts locally	`examples/` 3 standalone demos
Browse version history	`CHANGELOG.md`

Community

GitHub Discussions for design questions, show-and-tell, RFCs
GitHub Issues for bug reports and feature requests
Security disclosures: SECURITY.md

This is a 1-maintainer project. Realistic response time: a few days for bug reports with reproducers, longer for open-ended discussions.

Contributing

NeuroLisp is small by design. We review every PR with Occam's razor. New atoms, modules, or workflows must pass real-LLM end-to-end A/B validation before landing on main. See CONTRIBUTING.md for setup, conventions, and the in-scope / out-of-scope list.

License

Apache 2.0. Copyright Bangcheng Wang and NeuroLisp contributors.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.37.0a1 pre-release

Jun 20, 2026

This version

0.36.0

Jun 20, 2026

0.35.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurolisp-0.36.0.tar.gz (390.6 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neurolisp-0.36.0-py3-none-any.whl (222.8 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file neurolisp-0.36.0.tar.gz.

File metadata

Download URL: neurolisp-0.36.0.tar.gz
Upload date: Jun 20, 2026
Size: 390.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurolisp-0.36.0.tar.gz
Algorithm	Hash digest
SHA256	`af55b906f06b7369df26cc04205b57853958da4b4384d9f0a025bee773de8cbc`
MD5	`ee828d798cf9e5312bcd76d0c8dceb1f`
BLAKE2b-256	`5f7c2e553eb1ec2690ad8770f0582a9e5e7435167e7518a30ecb4a37a8cee984`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurolisp-0.36.0.tar.gz:

Publisher: release.yml on KevinBangbang/NeuroLisp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neurolisp-0.36.0.tar.gz
- Subject digest: af55b906f06b7369df26cc04205b57853958da4b4384d9f0a025bee773de8cbc
- Sigstore transparency entry: 1876554664
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: KevinBangbang/NeuroLisp@34810fa0066350bd389dabeb32e58637109254fb
- Branch / Tag: refs/tags/v0.36.0
- Owner: https://github.com/KevinBangbang
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@34810fa0066350bd389dabeb32e58637109254fb
- Trigger Event: push

File details

Details for the file neurolisp-0.36.0-py3-none-any.whl.

File metadata

Download URL: neurolisp-0.36.0-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 222.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurolisp-0.36.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f75bc18a8013b7d0acc04d76a2a9f222bbbc92b152f531b34d6a711e2bbea4d0`
MD5	`4db7a1793faf497d402870248846e227`
BLAKE2b-256	`970297ffe2e4614ffd66a5e692484ebffd8c8112d81c112bac6bd5c48f99c256`

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurolisp-0.36.0-py3-none-any.whl:

Publisher: release.yml on KevinBangbang/NeuroLisp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: neurolisp-0.36.0-py3-none-any.whl
- Subject digest: f75bc18a8013b7d0acc04d76a2a9f222bbbc92b152f531b34d6a711e2bbea4d0
- Sigstore transparency entry: 1876554769
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: KevinBangbang/NeuroLisp@34810fa0066350bd389dabeb32e58637109254fb
- Branch / Tag: refs/tags/v0.36.0
- Owner: https://github.com/KevinBangbang
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@34810fa0066350bd389dabeb32e58637109254fb
- Trigger Event: push

neurolisp 0.36.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

NeuroLisp

Why does this exist

Why smaller models can do bigger work

In 30 seconds

How a complex workflow actually runs

One user request, five MCP boundaries

Phase-by-phase trace

The boundary each layer enforces

What sqlite actually contains after one run

Get started

What you can do

Documentation

Community

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance