MCTS with Bayesian surprise for open-ended scientific discovery

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Surprisal

MCTS with Bayesian surprise for open-ended scientific discovery.

Surprisal is inspired by AllenAI's AutoDiscovery and the Surprisal-Guided Selection paper cited below. It explores a research domain by generating literature-grounded hypotheses, running bounded experiments in a sandbox with real tools and network access, and ranking branches by how much the evidence changes the model's beliefs.

Quick start

curl -fsSL https://raw.githubusercontent.com/jbarnes850/surprisal/main/install.sh | bash

uv run surprisal init \
  --domain "AI for scientific discovery" \
  --seed "LLM self-evaluation accuracy drops as task compositional depth increases"

uv run surprisal explore --budget 10 --concurrency 1
uv run surprisal status --tree
uv run surprisal export --top 5 --format md

The default backend (auto) runs experiments directly on your host with no Docker dependency. Progress streams through generator, runner, review, and belief phases.

If you switch to backend = "docker" for sandboxed execution, Surprisal will build surprisal-cpu:latest on first run and prompt for a claude setup-token if your CLI auth is subscription-backed.

Codex-based analysis and review stages run from per-experiment workspaces under /tmp/.../experiments/node_*, so the CLI invocation explicitly skips git-repo enforcement there.

What it does

Each expansion runs a per-node FSM:

experiment_generator: Claude searches recent literature and proposes one hypothesis plus one executable plan.
experiment_runner: a sandbox backend executes the plan with Python, Bash, local files, public network access, HuggingFace resources, and optional W&B logging.
experiment_analyst: Codex or Claude reviews the execution for fidelity and validity.
experiment_reviewer: Codex or Claude decides whether the evidence is usable.
experiment_reviser: if needed, the plan is revised and retried within configured bounds.
hypothesis_generator: Claude formalizes the post-experiment hypothesis record.
belief_elicitation: Claude samples prior and posterior binary judgments and Surprisal computes Bayesian surprise.

The deterministic MCTS layer never calls LLMs directly. It only consumes node state and reward signals.

Runtime model

Claude is required for research-facing roles: generator, hypothesis formalization, and belief elicitation.
If Codex is available, it handles analysis, review, and revision roles.
If Codex is not available, Claude handles all roles.
Agent sessions persist per branch in sessions.json: Claude research sessions, code-analysis sessions, and runner sessions are tracked separately and resumed automatically across nodes on the same branch.
Belief elicitation forks from the persisted research session instead of mutating it, so prior and posterior samples stay independent while still inheriting branch context.
Experiment execution uses the configured sandbox backend:
- auto (default): host-native runner, no Docker required, GPU autodetection
- docker: Docker-based sandbox for isolated execution (requires Docker + claude setup-token)
- hf_jobs: one-shot Hugging Face Jobs execution path for remote batch runs

Commands

Command	Purpose	Machine-readable output
`surprisal init`	Create or reuse an exploration for a domain	`--json`
`surprisal explore`	Run exploration on the latest or a specific exploration	`--json`
`surprisal status`	Show exploration summary and optional tree	`--json`
`surprisal export`	Export results as markdown, CSV, JSON, or JSONL training data	`--format json` or `--json`
`surprisal resume`	Alias for `explore` against the latest or a specific exploration	`--json`
`surprisal prune`	Mark low-value branches as pruned	`--json`
`surprisal config`	Show, set, or reset config	`--json`

resume resumes an exploration, not a per-agent conversational session.

Architecture

Three layers:

src/surprisal/mcts.py Deterministic tree policy, UCT scoring, progressive widening, and backpropagation.
src/surprisal/db.py, src/surprisal/exploration.py, src/surprisal/workspace.py SQLite WAL persistence plus per-branch workspaces.
src/surprisal/orchestrator.py, src/surprisal/fsm_runner.py Async worker orchestration and the multi-agent experiment FSM.

Key files:

src/surprisal/fsm_runner.py: per-node live FSM
src/surprisal/orchestrator.py: worker pool, selection, branching, and dedup scheduling
src/surprisal/bayesian.py: Beta posterior updates and belief-shift scoring
src/surprisal/prompts/: prompt contracts for generator, runner, analyst, reviewer, reviser, and belief stages

Configuration

Exploration state defaults to ~/.surprisal.

Config is loaded from:

${SURPRISAL_HOME}/config.toml when SURPRISAL_HOME is set
~/.surprisal/config.toml when that file exists
otherwise ${XDG_CONFIG_HOME:-~/.config}/surprisal/config.toml

Show the active config:

uv run surprisal config --show

Live config knobs:

Setting	Default	Description
`general.default_budget`	`100`	Default exploration budget
`general.default_concurrency`	`2`	Default worker count
`mcts.c_explore`	`1.414`	UCT exploration constant
`mcts.k_progressive`	`1.0`	Progressive widening coefficient
`mcts.alpha_progressive`	`0.5`	Progressive widening exponent
`mcts.max_depth`	`30`	Maximum tree depth
`mcts.belief_samples`	`10`	Samples per prior and posterior belief phase (set higher for publication-grade runs)
`mcts.virtual_loss`	`2`	Virtual loss applied during parallel selection
`mcts.dedup_interval`	`50`	Run deduplication every N completed expansions
`agents.claude_model`	`opus`	Claude model for research roles
`agents.codex_model`	`gpt-5.4`	Codex model for analysis, review, and revision roles
`agents.max_turns`	`20`	Max Claude turns per invocation
`agents.code_attempts`	`6`	Total runner attempts before failure
`agents.revision_attempts`	`1`	Total plan revisions after rejection
`agents.generator_timeout`	`180`	Generator timeout in seconds
`sandbox.backend`	`auto`	`auto` (host-native, recommended), `docker` (sandboxed), or `hf_jobs` (remote)
`sandbox.image`	`auto`	Docker sandbox image tag (only used with `backend = "docker"`)
`sandbox.gpu`	`true`	Enable GPU passthrough for the Docker sandbox
`sandbox.memory_limit`	`16g`	Docker sandbox memory limit
`sandbox.cpu_limit`	`4`	Docker sandbox CPU limit
`sandbox.timeout`	`1800`	Sandbox timeout in seconds
`sandbox.network`	`true`	Allow public network access in the sandbox
`sandbox.hf_flavor`	`a10g-small`	HF Jobs hardware flavor
`sandbox.hf_timeout`	`2h`	HF Jobs timeout
`belief.provider`	`claude`	Belief elicitation provider: `claude` (Likert sampling) or `openrouter` (logprob-based)
`belief.model`	`""`	OpenRouter model ID for belief elicitation (e.g., `minimax/minimax-m2.5`)
`belief.samples`	`30`	Samples per prior and posterior belief phase
`belief.kl_scale`	`5.0`	KL divergence scaling factor for Bayesian surprise
`belief.evidence_weight`	`2.0`	Evidence weight for posterior Beta fitting
`credentials.wandb_api_key`	`""`	Optional W&B API key
`credentials.hf_token`	`""`	Optional HuggingFace token
`credentials.claude_oauth_token`	`""`	Cached Claude OAuth token for Docker runner (auto-prompted on first run)

Belief calibration

Surprisal computes Bayesian surprise by comparing prior and posterior belief distributions. Two providers are available:

Claude (default): Samples Likert-scale judgments (definitely_true through definitely_false) via concurrent Claude calls. Higher fidelity but more API calls.
OpenRouter: Single-call logprob-based estimation. Faster and cheaper. Requires an OpenRouter API key.

To use OpenRouter belief elicitation:

cp .env.example .env
# Add your OpenRouter API key to .env

uv run surprisal config --set belief.provider openrouter
uv run surprisal config --set belief.model minimax/minimax-m2.5

Prior beliefs are clamped to [0.1, 0.9] to prevent degenerate Beta distributions from overconfident models. A calibration warning is logged when clamping shifts the prior mean by more than 0.05.

Literature grounding

The generator prefers alphaxiv MCP when available and falls back to the HuggingFace Papers API otherwise.

One-time alphaxiv setup:

claude mcp add --transport http alphaxiv https://api.alphaxiv.org/mcp/v1

Each hypothesis stores the papers that motivated it.

Validation

Run the test suite:

uv run pytest tests/ -q --tb=short

References

Agarwal et al., AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
Shi and Evans, Surprising combinations of research contents and contexts are related to impact
Barnes et al., Surprisal-Guided Selection

License

MIT

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.1.0

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surprisal_search-0.1.0.tar.gz (5.9 MB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

surprisal_search-0.1.0-py3-none-any.whl (65.6 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file surprisal_search-0.1.0.tar.gz.

File metadata

Download URL: surprisal_search-0.1.0.tar.gz
Upload date: Mar 31, 2026
Size: 5.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for surprisal_search-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dc6542b18c63dc118a351162492b20adc770ddf1a4e8795c4dc7d2ac412400cc`
MD5	`ee8ffaae2f59e7c980c646d1093b3010`
BLAKE2b-256	`d68ab658a9162eb783273f80469f4ee6d4fca7287ce24b0f33de73d4cccb8230`

See more details on using hashes here.

File details

Details for the file surprisal_search-0.1.0-py3-none-any.whl.

File metadata

Download URL: surprisal_search-0.1.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 65.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for surprisal_search-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dde3b7e488695c32471950c3e01b08a8f46a26c6163e515ef9e3385406df3cf3`
MD5	`647b9f3191eca499a18f3c8985cc344d`
BLAKE2b-256	`b23edf2ff6e653b2795c1992390e2bfd475bda04ecd63be2791c2ebc51ce4812`

See more details on using hashes here.

surprisal-search 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Surprisal

Quick start

What it does

Runtime model

Commands

Architecture

Configuration

Belief calibration

Literature grounding

Validation

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes