Open experiment runner for LLM behavior changes. Fork production traces, replay with a proposed change, score the diff, emit a PR-ready verdict report.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

voseghale

These details have not been verified by PyPI

Project description

whatifd

whatifd's product is the verdict's defensibility. Fork production traces, replay with a proposed change, score the diff — and ship a Ship / Don't Ship / Inconclusive verdict a reviewer can read, follow the reasoning, and either trust or know exactly which assumption to challenge.

whatifd workflow

When you change a prompt, model, or tool in an LLM system, you don't actually know whether it improves behavior — you guess, with a handful of cherry-picked traces and inconsistent evaluation. Every step in the workflow has a tool: Langfuse for traces, Inspect AI for scoring, GitHub for PRs. The experiment doesn't.

whatifd is the experiment runner. Fork production traces (failed cases plus a representative baseline), replay them with your proposed change (original tool outputs cached so side effects don't re-fire), score with the judge of your choice, and produce a Markdown + JSON verdict report you can attach to the PR. You stop shipping changes that fix one failure while silently regressing ten others. You go from "this feels better" to "this improved 14/20, regressed 3 — here's exactly where, and here's the evidence I'd defend in review."

Stop shipping LLM changes on gut feel.

whatifd on one page

Install

# Core + the adapters you use (each is an optional package):
uv pip install whatifd whatifd-langfuse whatifd-inspect-ai whatifd-phoenix

# From source (uv workspace) — includes the in-development whatifd-datadog adapter:
git clone https://github.com/victoralfred/whatifd
cd whatifd
uv sync --all-extras --dev --group workspace

Quickstart (programmatic)

The library API is the load-bearing surface. The snippet below is shape-only — it omits RunManifest, MethodologyDisclosure, and CacheSummary construction plus the actual run_pipeline(...) call to keep the README focused. The full runnable end-to-end example lives at docs/getting-started.md. Minimal shape:

from whatifd.adapters.stub import StubTraceSource, StubTraceSpec
from whatifd.adapters.factory import build_scorer
from whatifd.cli_pipeline import build_delta_fn
from whatifd.config import ChangeConfig, ScorerConfig
from whatifd.pipeline import run_pipeline
from whatifd.runner_loader import load_runner

# Your runner satisfies the contract Protocol — see docs/runner-contract.md
loaded_runner = load_runner("python:my_agent.replay:run")

scorer = build_scorer(ScorerConfig(adapter="stub"))  # or wire a real Inspect AI scorer

trace_source = StubTraceSource(specs=[
    StubTraceSpec(trace_id="f-1", user_message="...", original_response="...", cohort="failure"),
    # ...
])

delta_fn = build_delta_fn(
    loaded_runner=loaded_runner,
    scorer=scorer,
    change=ChangeConfig(system_prompt="new prompt"),
    replay_timeout_seconds=60.0,
)

# Construct floor / policy / runtime / methodology / cache_summary,
# then call run_pipeline → ReportV01.
# Full worked example: docs/getting-started.md.

Quickstart (CLI — stub adapters, no credentials needed)

# Write a config:
cat > whatifd.config.yaml <<EOF
source:
  adapter: stub
target:
  runner: python:examples.minimal_agent.replay:run
selection:
  failure_cohort: { limit: 5 }
  baseline_cohort: { limit: 5 }
change:
  system_prompt: my new prompt
scorer:
  adapter: stub
decision: {}
reporting: {}
timeouts: {}
EOF

# Run the fork:
uv run whatifd fork --config whatifd.config.yaml

# Exit codes:
#   0 = Ship verdict
#   1 = Don't Ship verdict
#   2 = Inconclusive verdict / setup failure / floor violation

Real Langfuse traces require LANGFUSE_HOST (or LANGFUSE_BASE_URL) + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY in the environment. Real Inspect AI scoring is reachable from YAML via scorer.score_fn: python:<module>:<attr>.

How it composes

whatifd doesn't replace your tracer or your eval framework — it composes them into an experiment.

Tracers (reads from): Langfuse, Arize Phoenix / OpenInference, and Datadog LLM Observability (each a small read-only adapter package). LangSmith / OpenTelemetry GenAI are candidates for future adapters.
Scorers (wraps): Inspect AI (real adapter shipped); pluggable via the scorer registry.
Your agent (calls back into): any Python callable matching the runner contract.
Downstream of whatifd's decisions: your CI gates on the exit code — a whatifd-fork GitHub Action (.github/actions/whatifd-fork/) and a GitLab CI/CD component (integrations/gitlab/) wrap it with verdict comments + artifacts. Also composes with SLO platforms (Nobl9, Sloth, Honeycomb) and incident tooling.

What `whatifd` is not

Not a tracer (use Langfuse / Phoenix / LangSmith / OpenTelemetry GenAI).
Not an offline eval harness (use Inspect AI / Promptfoo; whatifd wraps them).
Not an SLO platform (use Nobl9 / Sloth / Honeycomb downstream of whatifd's decisions).
Not an agent runtime — the runner contract is the boundary.
Not a UI or dashboard.
Not a substitute for production monitoring; not a benchmark suite; not a load test; not a causal estimator beyond replay association; not a judge-quality validator (see docs/concepts.md).

Documentation

docs/concepts.md — the conceptual model: defensible verdicts, non-claims, trust floor vs decision policy, failure-as-data, evidence and audit bundle
docs/getting-started.md — worked end-to-end example
docs/runner-contract.md — the user-facing extension point reference
docs/schema/v0.1.md — ReportV01 consumer compatibility guide
docs/walkthroughs/ — six rendered scenarios as reference (Ship, Don't Ship, Inconclusive)
examples/minimal_agent/ — copy-paste reference Runner

Design

The full design — problem framing, prior art, runner contract, report shape, eval target, milestones, risks — lives in DESIGN.md. The doctrine and cardinal rules are in .claude/skills/whatifd-design/SKILL.md.

Contributing

Alpha. Issues, design discussion, and pull requests welcome. The design doctrine and cardinal rules (in .claude/skills/whatifd-design/SKILL.md) are load-bearing — read them before proposing changes to the trust floor, schema, or verdict logic.

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

voseghale

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Jun 4, 2026

0.2.1

May 30, 2026

0.2.0

May 10, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatifd-0.3.0.tar.gz (5.5 MB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whatifd-0.3.0-py3-none-any.whl (250.5 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file whatifd-0.3.0.tar.gz.

File metadata

Download URL: whatifd-0.3.0.tar.gz
Upload date: Jun 4, 2026
Size: 5.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ad582c74b6efcbc85740af05434cf93246a09fdefb2db3dd4de9977a76ed59f1`
MD5	`71ae4f7b389aa08678e903319c7d90bb`
BLAKE2b-256	`076e5151855eb83a453b283e22e41ebea36621d6e1e6a0dc7e30847e013111a9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd-0.3.0.tar.gz:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whatifd-0.3.0.tar.gz
- Subject digest: ad582c74b6efcbc85740af05434cf93246a09fdefb2db3dd4de9977a76ed59f1
- Sigstore transparency entry: 1725303587
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: victoralfred/whatifd@47869c1d0653ebe9d95106ca9e5d263ff58ee5e0
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/victoralfred
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@47869c1d0653ebe9d95106ca9e5d263ff58ee5e0
- Trigger Event: push

File details

Details for the file whatifd-0.3.0-py3-none-any.whl.

File metadata

Download URL: whatifd-0.3.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 250.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3efed6286d804a31c1bd7f2806497202c862bd0a72900f3ef2d23e96a60e4944`
MD5	`04881f87514d8d9bf7bb5d49e817e868`
BLAKE2b-256	`a7e3ea2bc814d99f26d8672a97d3c97d71dde4bcd293acc9b3fd267b0a24ab3a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd-0.3.0-py3-none-any.whl:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whatifd-0.3.0-py3-none-any.whl
- Subject digest: 3efed6286d804a31c1bd7f2806497202c862bd0a72900f3ef2d23e96a60e4944
- Sigstore transparency entry: 1725303771
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: victoralfred/whatifd@47869c1d0653ebe9d95106ca9e5d263ff58ee5e0
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/victoralfred
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@47869c1d0653ebe9d95106ca9e5d263ff58ee5e0
- Trigger Event: push

whatifd 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whatifd

Install

Quickstart (programmatic)

Quickstart (CLI — stub adapters, no credentials needed)

How it composes

What `whatifd` is not

Documentation

Design

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

whatifd 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whatifd

Install

Quickstart (programmatic)

Quickstart (CLI — stub adapters, no credentials needed)

How it composes

What whatifd is not

Documentation

Design

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What `whatifd` is not