Skip to main content

Test your package through the eyes of a newbie agent — a fresh AI agent reads only your docs/skills and tries to use your package.

Project description

newb

Test your package through the eyes of a newbie agent.

A fresh AI agent reads only your _skills/ (or equivalent docs) and tries to use your package. If it succeeds — your docs work. If it fails — your CI tells you why.

Install

pip install newb

Use

newb run ./src/mypkg/_skills/mypkg
newb run ./_skills --format markdown >> README.md

newb run spins up a clean docker container with only your skills mounted (no host ~/.claude leak), then asks a fresh Claude agent four canonical questions:

  1. Identity — "What is this package for?" / "What problems does it solve?"
  2. Usage — "Show a working example" / "When should I NOT use this?"
  3. Boundary — author-supplied red tests in _red_tests.yaml ("Can this do ?" → must redirect, not hallucinate)

Output: JSON (for CI) or markdown (for README injection).

Requirements

  • Docker on PATH (for the agent sandbox).
  • ANTHROPIC_API_KEY in env (used inside the container).
  • Python 3.10+.

Library API

import newb

# Module-callable shortcut — for quick scripts:
report = newb("./src/mypkg/_skills/mypkg")

# Explicit form — identical behaviour, clearer intent:
report = newb.run("./src/mypkg/_skills/mypkg")

# Render the report as a README-ready markdown block:
print(newb.render_markdown(report))

newb.verify and newb.self_explain are backward-compat aliases for newb.run (kept through one minor release; removed in 1.0).

The verb mirrors pytest.main() — neutral, importable, no implication that the agent's success "proves" anything beyond what the asserts say.

Why no aggregate "score"?

Principle #1: No verification without specification. If you want a score, you must define what counts as correct.

Today newb returns the agent's actual answers (text) and per-test boundary results (boolean from _red_tests.yaml). It does not emit an aggregate "0.85" score because:

  • A single number invites gaming (people optimise for the score, not for actually-better docs).
  • Different tasks (description, usage, boundary) measure different things; averaging them is dishonest.
  • Without an explicit expected answer per question, the "score" is whatever the LLM judge feels like that minute — non-reproducible.

A scoring system based on author-provided expected answers (pytest- style discovery: tests_newb.py with def test_X(agent): assert ...) is planned for v0.3.0. Until then newb deliberately gives you the raw evidence and lets you decide what "good enough" means for your package.

Aliases

Also available as pip install newbie-test and pip install agentic-test (same package, defensive name reservations that depend on newb).

Heritage

newb was extracted from scitex-dev where the canonical integration still lives:

scitex-dev skills self-explain <package-name>

License

AGPL-3.0-only. Same as the SciTeX ecosystem from which newb was extracted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newb-0.3.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newb-0.3.2-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file newb-0.3.2.tar.gz.

File metadata

  • Download URL: newb-0.3.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.2.tar.gz
Algorithm Hash digest
SHA256 bd8cbb6d0b670279c4f856511830dba5840dddb811bdfbd1148406f23b44f697
MD5 b8b244e1dcd8e187bdca3a06037cf12e
BLAKE2b-256 5080809be8253fd51c4f09e7ef726ccec47684f45a72db7bc830d2712e802fb4

See more details on using hashes here.

File details

Details for the file newb-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: newb-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e29fb533b5655316e206200d517b216ad99e3f305a0ae040d0dc209acd42413
MD5 590bd6b22e056ba753bcd04a89bdc60f
BLAKE2b-256 433d7adf357d13ace9ba7d473f8ef8db749dc49f69542d68bdb240d56838c2b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page