Change-coupled eval probe generation for LLM systems

Project description

Probegen

Probegen detects behaviorally significant pull request changes in LLM systems and proposes targeted evaluation probes for review before writing them to an evaluation platform. Probegen is non-blocking — it runs as a parallel CI job and never prevents PR merges.

What it does

Probegen runs in CI on pull requests. It:

Detects changes to prompts, instructions, guardrails, validators, tool descriptions, classifiers, retry policies, output schemas, and other agent harness artifacts that are likely to alter agent behavior.
Retrieves nearby evaluation coverage from your existing eval stack when mappings exist.
Falls back to starter probe generation when no eval corpus exists yet.
Generates ranked probe proposals tailored to the specific change, including multi-turn conversational probes when the agent is conversational.
Exports those probes as files and, after explicit approval, writes them to the configured platform.

Probegen is not an eval runner. It generates eval inputs that plug into LangSmith, Braintrust, Arize Phoenix, Promptfoo, or file-based workflows.

Probegen works out of the box even if you have no evals yet. In that case it generates plausible starter probes from the diff, system prompt or guardrails, and whatever product context you provide. The more eval coverage and product detail you give it, the sharper its novelty detection and boundary analysis become.

Prerequisites

Python 3.11+
Node.js 22+ — required in CI by the GitHub Action (installed automatically). Only needed locally if running probegen run-stage directly.
An Anthropic API key
An eval platform API key only if you want direct platform integration or automatic writeback

Quick Start (GitHub Action)

Install the package: pip install probegen
Run interactive setup: probegen init — generates probegen.yaml, workflow file, and context/ stubs
Fill in context/product.md and context/bad_examples.md (and other context files for best results)

Add GitHub secrets:

Secret	Purpose	Where to get it
`ANTHROPIC_API_KEY`	Required — powers all three stages	console.anthropic.com → API Keys
`OPENAI_API_KEY`	Required for coverage-aware mode	platform.openai.com → API Keys
`LANGSMITH_API_KEY`	If using LangSmith	smith.langchain.com → Settings
`BRAINTRUST_API_KEY`	If using Braintrust	braintrust.dev → Settings
`PHOENIX_API_KEY`	If using Arize Phoenix	app.phoenix.arize.com → Settings

Create the approval label in GitHub:

gh label create "probegen:approve" --color 0075ca --description "Approve Probegen probe writeback"

Commit probegen.yaml, .github/workflows/probegen.yml, and context/.
Open a PR that touches a prompt or guardrail.
Run probegen doctor to verify your setup.

Cost control

Each stage has a configurable Anthropic API spend budget (see budgets: in probegen.yaml). Typical costs per PR:

Stage 1 (change detection): $0.05–0.30
Stage 2 (coverage analysis): $0.10–0.50
Stage 3 (probe generation): $0.10–0.60

Increase budget limits if stages time out on large diffs or complex repos.

Advanced Configuration

The full configuration reference is available in probegen.yaml.example.

Real example quickstart

If you want to test Probegen against a real LangGraph repo instead of wiring everything from scratch, use the in-repo demo under examples/langgraph-agentic-rag and follow examples/langgraph-agentic-rag/docs/quickstart.md.

Context pack and trace safety

Probegen works without a context pack, but probe quality drops significantly. At minimum, fill in product context and known failure modes. This matters even more in starter mode, where Probegen has no existing eval corpus to compare against.

Production traces are never sanitized by the tool. If you add files under context/traces/, anonymize them first. Remove names, emails, account IDs, and any other sensitive data before committing them.

Project details

Release history Release notifications | RSS feed

0.1.7

Mar 18, 2026

0.1.6

Mar 18, 2026

0.1.5

Mar 17, 2026

0.1.3

Mar 17, 2026

0.1.2

Mar 16, 2026

0.1.1

Mar 16, 2026

This version

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probegen-0.1.0.tar.gz (43.0 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

probegen-0.1.0-py3-none-any.whl (58.1 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file probegen-0.1.0.tar.gz.

File metadata

Download URL: probegen-0.1.0.tar.gz
Upload date: Mar 16, 2026
Size: 43.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for probegen-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aec528baad6fd68125c960043bb6da03bc2e19ec8fe435f854e9d4987c00002a`
MD5	`58e298fdecaa1cf4859ad097f7bad474`
BLAKE2b-256	`3016ac1b604f0ab08b473817e26f086cff5f0712e6fa843253e9fa0a76971dd9`

See more details on using hashes here.

File details

Details for the file probegen-0.1.0-py3-none-any.whl.

File metadata

Download URL: probegen-0.1.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 58.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for probegen-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb6a6d205f7c26011980d60f2ee13a5b61c82c26bed3950b2120f770e1186d11`
MD5	`f57d0d48f4139da66cb076550137570e`
BLAKE2b-256	`9db86243eab0a2bf0903bba4cbf2a587ce29b381a6d128b2c2b3bd947d64ea36`

See more details on using hashes here.

probegen 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Probegen

What it does

Prerequisites

Quick Start (GitHub Action)

Cost control

Advanced Configuration

Real example quickstart

Context pack and trace safety

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes