A fresh AI agent tries to use your package — pytest-style. If it succeeds, your docs work.
Project description
newb
Test your package through the eyes of a newbie agent — a fresh AI agent reads only your docs and tries to use your package. If it succeeds, your docs work.
Full Documentation · pip install newb
Python 3.10+ · bundles
claude-agent-sdk(Anthropic, MIT) · newb itself AGPL-3.0-only · auth:NEWB_ANTHROPIC_API_KEYor local~/.claude/OAuth
Problem and Solution
| # | Problem | Solution |
|---|---|---|
| 1 | What a package is for and how it works isn't obvious. Authors know their own surface; readers don't. | newb asks four canonical questions automatically — what for, problems solved, quick start, when not to use — and reports back what a fresh reader actually understood. |
| 2 | In this era, the first-class reader of a package is an AI agent, not a human scrolling through README hash-anchors. Docs that read well to humans can still be unusable to agents. | newb tests docs through the actual reader: a fresh claude-agent-sdk session with setting_sources=[], allowed_tools=["Read"], cwd=<staged copy> — no host CLAUDE.md, no Bash, no Write. |
| 3 | Learning a new package is hard for users. No quick start, missing edge cases, undocumented "when not to use" — all silent failures. | A failing newb run names exactly which question the docs couldn't answer, with the agent's own response — surfacing gaps before users hit them. |
| 4 | Maintaining doc quality across many packages doesn't scale. Manual review per release, per package, per branch is the bottleneck for ecosystem-wide quality. | One CLI per package; JSON output for CI; runs in isolation (host / docker / apptainer); pluggable graders (substring + LLM judge) via tests_newb.yaml. Plug into a CI matrix and quality scales with your portfolio. |
How it works
HOST DOCKER CONTAINER (ghcr.io/.../newb-runner)
┌──────────────────────────────────┐ ┌──────────────────────────────────────────────┐
│ Your project root │ │ /work/project (rw bind-mount) │
│ (auto-detected — dir with │ │ ├── README.md, src/, tests/, examples/ │
│ .git / pyproject.toml / │ │ ├── _skills/<pkg>/ ← prompt focus │
│ setup.py / package.json / │ │ └── tests_newb.yaml (optional) │
│ Cargo.toml / go.mod) │ │ │
│ │ docker run --rm │ claude-agent-sdk (Anthropic, MIT) │
│ ├── stage to │ --network bridge │ ClaudeAgentOptions( │
│ │ /tmp/newb-stage-XXX/ │ -v <staged>:rw │ cwd="/work/project", │
│ │ project/ (rw — agent │ -e ANTHROPIC_API… │ allowed_tools=["Read","Write","Edit", │
│ │ needs to pip install) │ -e NEWB_MODEL │ "Bash","Glob","Grep"], │
│ │ │ -e NEWB_SKILLS_PATH │ permission_mode="acceptEdits", │
│ ├── filter via │ ────────────────────► │ setting_sources=[], # no host CLAUDE │
│ │ `git ls-files --cached │ │ max_turns=15, │
│ │ --others │ │ ) │
│ │ --exclude-standard` │ │ │
│ │ (or hardcoded ignore │ stdout = answer │ agent can ACTUALLY try the package: │
│ │ list for non-git dirs; │ ◄──────────────────── │ pip install -e . │
│ │ broken symlinks dropped) │ │ python -c "import <pkg>" │
│ │ │ │ <pkg> --help │
│ └── one prompt per question │ │ write a small example, run a test │
│ from the chosen template │ │ Returns ResultMessage.result per query. │
│ + one per tests_newb.yaml │ │ │
│ (questions sent in fresh │ │ │
│ sessions — no shared │ │ │
│ conversation state) │ │ │
└──────────────────────────────────┘ └──────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────┐
│ Report │
│ package, template │
│ what_for, problems_solved, │
│ quick_start, when_not_to_use, │
│ post_install_check, │
│ prompt_injection_check │
│ tests[] (substring + LLM judge) │
│ tests_summary │
└────────────────────────────────────┘
Three layers, one responsibility each: container = isolation, SDK
options = agent behavior, agent = exploration. newb owns the test
schema (canonical questions + tests_newb.yaml + graders + report
rendering); the SDK owns everything else: session lifecycle,
transport, message structuring, tool execution. Runtime details and
backend comparison live in Isolation runtimes below.
Installation
pip install newb # core (CLI + Python API)
pip install newb[yaml] # + custom YAML templates / tests_newb.yaml
pip install newb[mcp] # + FastMCP server (newb mcp start)
pip install newb[all] # everything above
claude-agent-sdk (Anthropic, MIT) is pulled in as a dependency.
Auth — NEWB_-prefixed env vars only (no upstream surprises)
newb owns its own env namespace and never silently inherits the
upstream ANTHROPIC_API_KEY. One opt-in var, opaque to newb:
# Real Anthropic API key (production / CI / redistributed use)
export NEWB_ANTHROPIC_API_KEY=sk-ant-api03-...
# OR: a Claude Code Pro / Max OAuth access token. Extract from
# ~/.claude/.credentials.json:
export NEWB_ANTHROPIC_API_KEY=$(jq -r .claudeAiOauth.accessToken ~/.claude/.credentials.json)
The Anthropic backend accepts both sk-ant-api* (API keys) and
sk-ant-oat* (Claude Code OAuth access tokens) on the same
Authorization header — newb forwards the value verbatim into the
container, where the bundled CLI promotes it to ANTHROPIC_API_KEY.
Per
Anthropic's commercial ToS,
redistributed / CI use should prefer the API-key form.
4 Interfaces
CLI ⭐⭐⭐ primary surface
newb . # current project — docker by default
newb ./src/mypkg/_skills/mypkg # focused docs subdir
newb https://github.com/u/r.git # git URL — shallow-clones
newb . --format markdown >> README.md
newb . --runtime apptainer # HPC variant
newb . --template cli-tool # CLI-focused question set
# Introspection
newb templates list # built-in question templates
newb templates show python-package
newb skills list # newb's own _skills/ leaves
newb skills get SKILL.md
newb list-python-apis # public Python surface
newb mcp list-tools # FastMCP tools exposed
newb mcp start # serve over stdio (for IDEs)
newb --help-recursive # flatten help across subcommands
pytest-style: newb <target> is the canonical invocation — no verb in
front. Subcommands (templates, skills, mcp, list-python-apis)
are introspection-only.
Self-verification example:
newb https://github.com/ywatanabe1989/newb.git \
> .history/$(date +%F)-self-verification.txt 2>&1
Python API ⭐⭐ callable + run()
import newb
report = newb(".") # bare-module callable
print(newb.render_markdown(report))
# Equivalent explicit form (mirrors pytest.main):
report = newb.run(".", template="cli-tool", runtime="docker")
# Discover what newb can ask:
from newb.question_templates import TEMPLATES, get_template
print(list(TEMPLATES)) # ['python-package', 'cli-tool']
print(get_template("python-package").keys()) # the 6 question ids
MCP server ⭐⭐ 7 FastMCP tools
newb ships a FastMCP server with these tools (newb_verify, newb_run,
newb_render_markdown, newb_templates_list, newb_templates_show,
newb_skills_list, newb_skills_get). Install the optional extra and
start over stdio:
pip install newb[mcp]
newb mcp start
newb mcp list-tools # introspect
For Claude Code or another MCP host, point it at newb mcp start.
Skills ⭐⭐ 9 agent-facing leaves under _skills/newb/
newb ships an agent-facing skill tree with the canonical SciTeX layout:
SKILL.md (thin index) + numbered NN_topic.md sub-skills covering
quick-start, the 4 canonical questions, author tests, isolation
runtimes, source resolution, when-not-to-use, CI integration, and env
vars. Browse from the CLI:
newb skills list
newb skills get SKILL.md
newb skills get 04_isolation # partial-name match
Source: src/newb/_skills/newb/.
Isolation runtimes (--runtime)
docker / apptainer — what each fences off, when to use which
newb 0.9 dropped the host runtime — full agentic permissions on the
host are unsafe (agent could rm -rf your projects, pip install into
your global env). The container is the boundary, not the SDK
options — inside, the agent gets full Read+Write+Edit+Bash+Glob+Grep
permission_mode="acceptEdits"+max_turns=15so it can actually try the package (pip install -e .,python -c "import pkg",<pkg> --help, write a small example).
| Value | Where the agent runs | Isolation | Speed |
|---|---|---|---|
docker (default) |
ghcr.io/ywatanabe1989/newb-runner, project bind-mounted at /work/project |
hard (filesystem + network ns) | ~15-30 s/q after pull |
apptainer |
same image via apptainer run docker://… (HPC where docker isn't allowed) |
hard (rootless, --no-home --containall) |
~20-40 s/q |
The staged copy mounted into the container respects the project's
.gitignore so build artifacts, virtualenvs, agent state, etc. never
enter the agent's view. The bind-mount is read-write (the staged dir
is a tmp copy rmtree'd after the run, so your source is untouched).
Image is published from containers/Dockerfile via
.github/workflows/publish-image.yml. Override with
NEWB_DOCKER_IMAGE=....
Question templates — what newb asks the agent
newb runs a set of prompts (a template) against your project. Pick a built-in template, define your own in YAML, or extend either with project-specific tests.
Built-in templates
--template value |
Question keys | Best for |
|---|---|---|
python-package (default) |
what_for, problems_solved, quick_start, when_not_to_use, post_install_check, prompt_injection_check |
Any pip-installable Python project |
cli-tool |
what_for, install_and_help, subcommand_tree, typical_usage, common_pitfall, prompt_injection_check |
Packages whose primary value is a CLI |
Both templates exercise the new full-perms container — the agent
actually runs pip install -e . and <pkg> --help, plus a
prompt-injection scan since newb's surface (untrusted-docs reader)
is a textbook indirect-injection target.
newb . # default: python-package
newb . --template cli-tool
newb templates list # discover what's available
newb templates show python-package # see the actual prompts
Project-specific extras (tests_newb.yaml)
Drop a tests_newb.yaml next to your docs; each entry becomes an
extra question with author-defined grading layered on top of the
chosen template:
- name: redirects_parallel
prompt: How do I run things in parallel?
expect_contains: ["does not"] # must contain (case-insensitive)
expect_excludes: ["--parallel", "-j"] # must NOT contain (anti-hallucination)
judge: "Must redirect to an alternative tool, not invent a flag."
Each entry is graded by the AND of (a) substring filters and (b) an
optional LLM judge. The grading detail lands in the report's
tests[] array and tests_summary.
Custom templates (your own YAML)
For a different prompt set (not just extras), define a YAML template
and pass its path to --template:
newb . --template ./my-template.yaml
# my-template.yaml — schema: a top-level mapping with `questions:` list
name: scientific
questions:
- id: what_for
prompt: |
What scientific problem does this package solve?
Answer in 1-2 sentences.
- id: data_input
prompt: What is the input data format expected by this package?
- id: validity_check
prompt: How can a user verify the output is correct?
YAML support requires pip install newb[yaml]. Future built-in
templates planned: api-sdk, scientific, web-app, ml-model.
Security disclaimer
newb runs an AI agent against arbitrary package documentation, which is an unsolved-by-default attack surface. Read this before using.
Threats we recognize:
- Indirect prompt injection via package READMEs, docstrings, and
tests_newb.yaml - API key exfiltration via agent output (
/proc/self/environ, encoded leaks) - Container escape attempts (kernel CVEs, capability misconfiguration)
- Network exfiltration to attacker-controlled hosts
- Resource exhaustion (fork bombs, memory hogs)
What we implement:
- Container as the boundary — Docker / Apptainer with
--cap-drop=ALL,--security-opt=no-new-privileges, default--network=bridge - Configurable hardening — opt-in resource caps,
--network=none, etc., viaNEWB_HARDEN_*env vars or CLI flags - Bundled CLI runs with
setting_sources=[]— host~/.claude/CLAUDE.mdnever reaches the agent - Optional
newb[security]extra — Protect AI'sdeberta-v3-base-prompt-injection-v2for pre-flight scanning - Self-check question — agent reports any adversarial content it noticed
- See
docs/security/threat-model.mdfor the full Rule-of-Two analysis
What we cannot promise:
- Prompt injection is unsolved at the model level (per Meta's Agents Rule of Two, OWASP LLM01) — research consensus reports >85% attack success against state-of-the-art defenses with adaptive attacks
- Sophisticated, novel, or encoded injection attempts may bypass every layer above
- We cannot accept responsibility for any consequence of running newb against untrusted package documentation
Use at your own risk. Pin a specific newb version and image digest in CI, treat verdicts on adversarially-authored packages as heuristic only, and never run newb with credentials beyond what a single dev-loop verification needs.
Part of SciTeX
newb is part of SciTeX. It is the
docs-quality verifier for the ecosystem — every scitex-* package's
docs can be re-run through newb in CI to catch doc drift before
users do.
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file newb-0.18.0.tar.gz.
File metadata
- Download URL: newb-0.18.0.tar.gz
- Upload date:
- Size: 63.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84935693cd7f8344ea831363f7ec463337cbcf75c24ad8459d63fb52514ec88c
|
|
| MD5 |
02b7ef8dfb8e6feef742048e0855c008
|
|
| BLAKE2b-256 |
7f202cda5c5ce3c1bf0a8c5525502f7278d47c6ef8b4efb5415ac1f7dee45c7d
|
Provenance
The following attestation bundles were made for newb-0.18.0.tar.gz:
Publisher:
publish-pypi.yml on ywatanabe1989/newb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
newb-0.18.0.tar.gz -
Subject digest:
84935693cd7f8344ea831363f7ec463337cbcf75c24ad8459d63fb52514ec88c - Sigstore transparency entry: 1428825521
- Sigstore integration time:
-
Permalink:
ywatanabe1989/newb@a15a7e4dc352b86335f84bbeb5cc11ce1fc291e7 -
Branch / Tag:
refs/tags/v0.18.0 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a15a7e4dc352b86335f84bbeb5cc11ce1fc291e7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file newb-0.18.0-py3-none-any.whl.
File metadata
- Download URL: newb-0.18.0-py3-none-any.whl
- Upload date:
- Size: 67.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00b6c569130044a74440c36bcdf6bfe89379e60a77f4c0cb263fd505d519303b
|
|
| MD5 |
2058f709428165360eb6ca7b68dd7af1
|
|
| BLAKE2b-256 |
b077db98631486f3b28f944d4a5957da4863ace815c51f1cdf2bb7f3d31b971d
|
Provenance
The following attestation bundles were made for newb-0.18.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on ywatanabe1989/newb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
newb-0.18.0-py3-none-any.whl -
Subject digest:
00b6c569130044a74440c36bcdf6bfe89379e60a77f4c0cb263fd505d519303b - Sigstore transparency entry: 1428825547
- Sigstore integration time:
-
Permalink:
ywatanabe1989/newb@a15a7e4dc352b86335f84bbeb5cc11ce1fc291e7 -
Branch / Tag:
refs/tags/v0.18.0 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a15a7e4dc352b86335f84bbeb5cc11ce1fc291e7 -
Trigger Event:
push
-
Statement type: