Skip to main content

Test your package through the eyes of a newbie agent — a fresh AI agent reads only your docs/skills and tries to use your package.

Project description

newb

Test your package through the eyes of a newbie agent.

A fresh AI agent reads only your _skills/ (or equivalent docs) and tries to use your package. If it succeeds — your docs work. If it fails — your CI tells you why.

Install

pip install newb

Use

newb run ./src/mypkg/_skills/mypkg
newb run ./_skills --format markdown >> README.md

newb run spins up a clean docker container with only your skills mounted (no host ~/.claude leak), then asks a fresh Claude agent four canonical questions:

  1. Identity — "What is this package for?" / "What problems does it solve?"
  2. Usage — "Show a working example" / "When should I NOT use this?"
  3. Boundary — author-supplied red tests in _red_tests.yaml ("Can this do ?" → must redirect, not hallucinate)

Output: JSON (for CI) or markdown (for README injection).

Requirements

  • Docker on PATH (for the agent sandbox).
  • ANTHROPIC_API_KEY in env (used inside the container).
  • Python 3.10+.

Library API

import newb

# Module-callable shortcut — for quick scripts:
report = newb("./src/mypkg/_skills/mypkg")

# Explicit form — identical behaviour, clearer intent:
report = newb.run("./src/mypkg/_skills/mypkg")

# Render the report as a README-ready markdown block:
print(newb.render_markdown(report))

newb.verify and newb.self_explain are backward-compat aliases for newb.run (kept through one minor release; removed in 1.0).

The verb mirrors pytest.main() — neutral, importable, no implication that the agent's success "proves" anything beyond what the asserts say.

Why no aggregate "score"?

Principle #1: No verification without specification. If you want a score, you must define what counts as correct.

Today newb returns the agent's actual answers (text) and per-test boundary results (boolean from _red_tests.yaml). It does not emit an aggregate "0.85" score because:

  • A single number invites gaming (people optimise for the score, not for actually-better docs).
  • Different tasks (description, usage, boundary) measure different things; averaging them is dishonest.
  • Without an explicit expected answer per question, the "score" is whatever the LLM judge feels like that minute — non-reproducible.

A scoring system based on author-provided expected answers (pytest- style discovery: tests_newb.py with def test_X(agent): assert ...) is planned for v0.3.0. Until then newb deliberately gives you the raw evidence and lets you decide what "good enough" means for your package.

Aliases

Also available as pip install newbie-test and pip install agentic-test (same package, defensive name reservations that depend on newb).

Heritage

newb was extracted from scitex-dev where the canonical integration still lives:

scitex-dev skills self-explain <package-name>

License

AGPL-3.0-only. Same as the SciTeX ecosystem from which newb was extracted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newb-0.3.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newb-0.3.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file newb-0.3.1.tar.gz.

File metadata

  • Download URL: newb-0.3.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a8d752f7706a3f0121b0619620083e8b428bd141a9e395ce01df3c83f4cd428b
MD5 61424a7396e592416c7c25c9fa55f0d5
BLAKE2b-256 cefc2e54c1cb9a6b1ece964c601bbbeecae89cbeb942715a343008b9da1ff7e5

See more details on using hashes here.

File details

Details for the file newb-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: newb-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2029a52de7296e5d26291039d725398a89d38486702c7c6f7a9e70b388dd7842
MD5 700d0e7ac4d3b23c3b8029ded5841ace
BLAKE2b-256 36fd6eafc44ee279f65e75a4fa207cedb877aadb94e299f242823cb2ed876531

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page