Skip to main content

Test your package through the eyes of a newbie agent — a fresh AI agent reads only your docs/skills and tries to use your package.

Project description

newb

Test your package through the eyes of a newbie agent.

A fresh AI agent reads only your _skills/ (or equivalent docs) and tries to use your package. If it succeeds — your docs work. If it fails — your CI tells you why.

Install

pip install newb

Use

newb verify ./src/mypkg/_skills/mypkg
newb verify ./_skills --format markdown >> README.md

newb verify spins up a clean docker container with only your skills mounted (no host ~/.claude leak), then asks a fresh Claude agent four canonical questions:

  1. Identity — "What is this package for?" / "What problems does it solve?"
  2. Usage — "Show a working example" / "When should I NOT use this?"
  3. Boundary — author-supplied red tests in _red_tests.yaml ("Can this do ?" → must redirect, not hallucinate)

Output: JSON (for CI) or markdown (for README injection).

Requirements

  • Docker on PATH (for the agent sandbox).
  • ANTHROPIC_API_KEY in env (used inside the container).
  • Python 3.10+.

Library API

import newb

# Module-callable shortcut — for quick scripts:
report = newb("./src/mypkg/_skills/mypkg")

# Explicit form — identical behaviour, clearer intent:
report = newb.verify("./src/mypkg/_skills/mypkg")

# Render the report as a README-ready markdown block:
print(newb.render_markdown(report))

newb.self_explain is kept as a backward-compat alias for verify (removed in 1.0).

Why no aggregate "score"?

Principle #1: No verification without specification. If you want a score, you must define what counts as correct.

Today newb returns the agent's actual answers (text) and per-test boundary results (boolean from _red_tests.yaml). It does not emit an aggregate "0.85" score because:

  • A single number invites gaming (people optimise for the score, not for actually-better docs).
  • Different tasks (description, usage, boundary) measure different things; averaging them is dishonest.
  • Without an explicit expected answer per question, the "score" is whatever the LLM judge feels like that minute — non-reproducible.

A scoring system based on author-provided expected answers (pytest- style discovery: tests_newb.py with def test_X(agent): assert ...) is planned for v0.3.0. Until then newb deliberately gives you the raw evidence and lets you decide what "good enough" means for your package.

Aliases

Also available as pip install newbie-test and pip install agentic-test (same package, defensive name reservations that depend on newb).

Heritage

newb was extracted from scitex-dev where the canonical integration still lives:

scitex-dev skills self-explain <package-name>

License

AGPL-3.0-only. Same as the SciTeX ecosystem from which newb was extracted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newb-0.3.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newb-0.3.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file newb-0.3.0.tar.gz.

File metadata

  • Download URL: newb-0.3.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.0.tar.gz
Algorithm Hash digest
SHA256 211e640e63cdc5b6803a563b5431544995e2e5969cd6d946d1f7c27ae35c0803
MD5 08aef9c46ffe6fa5112f5b5e0361eca1
BLAKE2b-256 2d715cc43aa6f7776afac081a00a52627a5452214e6d9672f5820da5b2ec2ac7

See more details on using hashes here.

File details

Details for the file newb-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: newb-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for newb-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee0bdb82bdcd0dba56581e62ef184347aa608fd3cce9bfa2c34bce57a16f25a4
MD5 7bac61d86281d834d9c9ec9a48e92036
BLAKE2b-256 6d2e10b1727fcaa5a0a4246a7b160e4d3039c9b23e2e21838302711be75a3d73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page