Test your package through the eyes of a newbie agent — a fresh AI agent reads only your docs/skills and tries to use your package.
Project description
newb
Test your package through the eyes of a newbie agent.
A fresh AI agent reads only your _skills/ (or equivalent docs) and tries
to use your package. If it succeeds — your docs work. If it fails — your CI
tells you why.
Install
pip install newb
Use
newb run ./src/mypkg/_skills/mypkg
newb run ./_skills --format markdown >> README.md
newb run spins up a clean docker container with only your skills
mounted (no host ~/.claude leak), then asks a fresh Claude agent four
canonical questions:
- Identity — "What is this package for?" / "What problems does it solve?"
- Usage — "Show a working example" / "When should I NOT use this?"
- Boundary — author-supplied red tests in
_red_tests.yaml("Can this do ?" → must redirect, not hallucinate)
Output: JSON (for CI) or markdown (for README injection).
Requirements
- Docker on PATH (for the agent sandbox).
ANTHROPIC_API_KEYin env (used inside the container).- Python 3.10+.
Library API
import newb
# Module-callable shortcut — for quick scripts:
report = newb("./src/mypkg/_skills/mypkg")
# Explicit form — identical behaviour, clearer intent:
report = newb.run("./src/mypkg/_skills/mypkg")
# Render the report as a README-ready markdown block:
print(newb.render_markdown(report))
newb.verify and newb.self_explain are backward-compat aliases for
newb.run (kept through one minor release; removed in 1.0).
The verb mirrors pytest.main() — neutral, importable, no implication
that the agent's success "proves" anything beyond what the asserts say.
Why no aggregate "score"?
Principle #1: No verification without specification. If you want a score, you must define what counts as correct.
Today newb returns the agent's actual answers (text) and per-test
boundary results (boolean from _red_tests.yaml). It does not
emit an aggregate "0.85" score because:
- A single number invites gaming (people optimise for the score, not for actually-better docs).
- Different tasks (description, usage, boundary) measure different things; averaging them is dishonest.
- Without an explicit expected answer per question, the "score" is whatever the LLM judge feels like that minute — non-reproducible.
A scoring system based on author-provided expected answers (pytest-
style discovery: tests_newb.py with def test_X(agent): assert ...)
is planned for v0.3.0. Until then newb deliberately gives you
the raw evidence and lets you decide what "good enough" means for
your package.
Aliases
Also available as pip install newbie-test and pip install agentic-test
(same package, defensive name reservations that depend on newb).
Heritage
newb was extracted from
scitex-dev where the
canonical integration still lives:
scitex-dev skills self-explain <package-name>
License
AGPL-3.0-only. Same as the SciTeX ecosystem from which newb was extracted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file newb-0.3.2.tar.gz.
File metadata
- Download URL: newb-0.3.2.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd8cbb6d0b670279c4f856511830dba5840dddb811bdfbd1148406f23b44f697
|
|
| MD5 |
b8b244e1dcd8e187bdca3a06037cf12e
|
|
| BLAKE2b-256 |
5080809be8253fd51c4f09e7ef726ccec47684f45a72db7bc830d2712e802fb4
|
File details
Details for the file newb-0.3.2-py3-none-any.whl.
File metadata
- Download URL: newb-0.3.2-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e29fb533b5655316e206200d517b216ad99e3f305a0ae040d0dc209acd42413
|
|
| MD5 |
590bd6b22e056ba753bcd04a89bdc60f
|
|
| BLAKE2b-256 |
433d7adf357d13ace9ba7d473f8ef8db749dc49f69542d68bdb240d56838c2b0
|