Open experiment runner for LLM behavior changes. Fork production traces, replay with a proposed change, score the diff, emit a PR-ready verdict report.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

voseghale

These details have not been verified by PyPI

Project description

whatifd

whatifd's product is the verdict's defensibility. Fork production traces, replay with a proposed change, score the diff — and ship a Ship / Don't Ship / Inconclusive verdict a reviewer can read, follow the reasoning, and either trust or know exactly which assumption to challenge.

whatifd workflow

When you change a prompt, model, or tool in an LLM system, you don't actually know whether it improves behavior — you guess, with a handful of cherry-picked traces and inconsistent evaluation. Every step in the workflow has a tool: Langfuse for traces, Inspect AI for scoring, GitHub for PRs. The experiment doesn't.

whatifd is the experiment runner. Fork production traces (failed cases plus a representative baseline), replay them with your proposed change (original tool outputs cached so side effects don't re-fire), score with the judge of your choice, and produce a Markdown + JSON verdict report you can attach to the PR. You stop shipping changes that fix one failure while silently regressing ten others. You go from "this feels better" to "this improved 14/20, regressed 3 — here's exactly where, and here's the evidence I'd defend in review."

Stop shipping LLM changes on gut feel.

whatifd on one page

Status

Pre-alpha; v0.1 release candidate. The library API runs end-to-end against the synthetic stub adapter and against the real whatifd-langfuse + whatifd-inspect-ai adapters; the whatifd fork CLI dispatcher is wired through the full factory → runner-loader → delta_fn → run_pipeline → render path. PyPI publication is pending.

Version	Target	What it does
v0.1	M10 (release candidate)	Langfuse ingest, prompt override, cached-tool replay, Inspect AI scorer, evidence-first Markdown + JSON reports, CI exit codes.
v0.2	M11	Stratified bootstrap CI, scorer cache wiring, second tracer adapter, model swap, GitHub Action wrapper.
v0.3	M12	Live-tool replay (opt-in, allowlist), worked CI sample repo.
v1.0	year 2	The pre-merge regression gate for LLM behavior.

Install

# Once published to PyPI:
uv pip install whatifd whatifd-langfuse whatifd-inspect-ai

# From source (uv workspace):
git clone https://github.com/victoralfred/whatifd
cd whatifd
uv sync --all-extras --dev --group workspace

Quickstart (programmatic — works today)

The library API is the load-bearing surface. The snippet below is shape-only — it omits RunManifest, MethodologyDisclosure, and CacheSummary construction plus the actual run_pipeline(...) call to keep the README focused. The full runnable end-to-end example lives at docs/getting-started.md. Minimal shape:

from whatifd.adapters.stub import StubTraceSource, StubTraceSpec
from whatifd.adapters.factory import build_scorer
from whatifd.cli_pipeline import build_delta_fn
from whatifd.config import ChangeConfig, ScorerConfig
from whatifd.pipeline import run_pipeline
from whatifd.runner_loader import load_runner

# Your runner satisfies the contract Protocol — see docs/runner-contract.md
loaded_runner = load_runner("python:my_agent.replay:run")

scorer = build_scorer(ScorerConfig(adapter="stub"))  # or wire a real Inspect AI scorer

trace_source = StubTraceSource(specs=[
    StubTraceSpec(trace_id="f-1", user_message="...", original_response="...", cohort="failure"),
    # ...
])

delta_fn = build_delta_fn(
    loaded_runner=loaded_runner,
    scorer=scorer,
    change=ChangeConfig(system_prompt="new prompt"),
    replay_timeout_seconds=60.0,
)

# Construct floor / policy / runtime / methodology / cache_summary,
# then call run_pipeline → ReportV01.
# Full worked example: docs/getting-started.md.

Quickstart (CLI — stub adapters work today)

# Write a config:
cat > whatifd.config.yaml <<EOF
source:
  adapter: stub
target:
  runner: python:examples.minimal_agent.replay:run
selection:
  failure_cohort: { limit: 5 }
  baseline_cohort: { limit: 5 }
change:
  system_prompt: my new prompt
scorer:
  adapter: stub
decision: {}
reporting: {}
timeouts: {}
EOF

# Run the fork:
uv run whatifd fork --config whatifd.config.yaml

# Exit codes:
#   0 = Ship verdict
#   1 = Don't Ship verdict
#   2 = Inconclusive verdict / setup failure / floor violation

Real Langfuse traces require LANGFUSE_HOST (or LANGFUSE_BASE_URL) + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY in the environment. Real Inspect AI scoring requires the programmatic API in v0.1 (config-loaded score_fn is a v0.2 cascade entry — see phases.md).

How it composes

whatifd doesn't replace your tracer or your eval framework — it composes them into an experiment.

Tracers (reads from): Langfuse (v0.1, real adapter shipped); Phoenix / LangSmith / OpenTelemetry GenAI (v0.2+).
Scorers (wraps): Inspect AI (v0.1, real adapter shipped); pluggable via the scorer registry.
Your agent (calls back into): any Python callable matching the runner contract.
Downstream of whatifd's decisions: your existing CI (GitHub Actions, GitLab CI), SLO platforms (Nobl9, Sloth, Honeycomb), incident tooling.

What `whatifd` is not

Not a tracer (use Langfuse / Phoenix / LangSmith / OpenTelemetry GenAI).
Not an offline eval harness (use Inspect AI / Promptfoo; whatifd wraps them).
Not an SLO platform (use Nobl9 / Sloth / Honeycomb downstream of whatifd's decisions).
Not an agent runtime — the runner contract is the boundary.
Not a UI or dashboard.
Not a substitute for production monitoring; not a benchmark suite; not a load test; not a causal estimator beyond replay association; not a judge-quality validator (see docs/concepts.md).

Documentation

docs/concepts.md — the conceptual model: defensible verdicts, non-claims, trust floor vs decision policy, failure-as-data, evidence and audit bundle
docs/getting-started.md — worked end-to-end example
docs/runner-contract.md — the user-facing extension point reference
docs/schema/v0.1.md — ReportV01 consumer compatibility guide
docs/walkthroughs/ — six rendered scenarios as reference (Ship, Don't Ship, Inconclusive)
examples/minimal-agent/ — copy-paste reference Runner

Design

The full design — problem framing, prior art, runner contract, report shape, eval target, milestones, risks — lives in DESIGN.md. The doctrine and cardinal rules are in .claude/skills/whatifd-design/SKILL.md.

Contributing

Pre-alpha. Issues and design discussion welcome; pull requests deferred until v0.1 ships.

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

voseghale

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

May 10, 2026

This version

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatifd-0.1.0.tar.gz (5.3 MB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whatifd-0.1.0-py3-none-any.whl (211.6 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file whatifd-0.1.0.tar.gz.

File metadata

Download URL: whatifd-0.1.0.tar.gz
Upload date: May 9, 2026
Size: 5.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`348b5d696fe9eaefdb095628fbb01ba19a33c0d58e4a4e84e9596e4208f522e1`
MD5	`14890e14fadd746ae4f892fb1abd18b5`
BLAKE2b-256	`e850c7e524dfa00e07a0c30606eba3e0b02185d9340678973cb416a238f6b6cb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd-0.1.0.tar.gz:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whatifd-0.1.0.tar.gz
- Subject digest: 348b5d696fe9eaefdb095628fbb01ba19a33c0d58e4a4e84e9596e4208f522e1
- Sigstore transparency entry: 1485316607
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: victoralfred/whatifd@12a263c7609dab10db1b2fbbd5f4b55d819f1a6d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/victoralfred
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@12a263c7609dab10db1b2fbbd5f4b55d819f1a6d
- Trigger Event: push

File details

Details for the file whatifd-0.1.0-py3-none-any.whl.

File metadata

Download URL: whatifd-0.1.0-py3-none-any.whl
Upload date: May 9, 2026
Size: 211.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0f1f499245aca7c3bb2705da7241c158553e73d7a0a17e7e3ce1839eb6f25fc4`
MD5	`0d28805e8a7c8c3cfd5021bcf8f82c0b`
BLAKE2b-256	`d52195b0756b7c46e4b7eeb36430c0924587298fe5dd3029aba6ee386f8fbbcd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd-0.1.0-py3-none-any.whl:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whatifd-0.1.0-py3-none-any.whl
- Subject digest: 0f1f499245aca7c3bb2705da7241c158553e73d7a0a17e7e3ce1839eb6f25fc4
- Sigstore transparency entry: 1485316718
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: victoralfred/whatifd@12a263c7609dab10db1b2fbbd5f4b55d819f1a6d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/victoralfred
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@12a263c7609dab10db1b2fbbd5f4b55d819f1a6d
- Trigger Event: push

whatifd 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whatifd

Status

Install

Quickstart (programmatic — works today)

Quickstart (CLI — stub adapters work today)

How it composes

What whatifd is not

Documentation

Design

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What `whatifd` is not