Skip to main content

Platform-agnostic A/B experiment readout auditor with SRM, peeking, MDE, practical significance, Welch t-test, guardrail, and pre-period balance checks.

Project description

TrialCheck

Platform-agnostic A/B experiment readout auditor.

CI PyPI Python License

Checks Tests Scenarios Dependencies

TrialCheck does not run experiments. It audits completed readouts from any experimentation platform, spreadsheet, or warehouse export and returns a structured PASS / WARN / FAIL report.

About

Most experimentation platforms surface a p-value and a lift estimate. That is not enough information to make a trustworthy ship decision.

Before shipping an experiment result, a senior data scientist checks a consistent set of questions: Did assignment work correctly? Was the result called early? Is the effect large enough to matter in practice? Did any guardrail metrics move harmfully? Were the variants balanced before the test started? These checks are well-understood, but they are rarely automated — they live in runbooks, reviewer checklists, or institutional memory.

TrialCheck packages those checks into a single library call. It accepts a structured experiment summary (assignment counts, metric data, optional guardrails and pre-period covariates) and returns a per-check PASS / WARN / FAIL / INSUFFICIENT_INPUT report with recommendations. The result is readable by humans and parseable by machines (JSON, Markdown, HTML output).

The intended use case: a data scientist or analytics lead runs TrialCheck at readout time, reviews the report, and makes a better-informed decision. TrialCheck is decision support — not a decision-maker.

Architecture

flowchart TD
    IN["ExperimentSummary\nassignment counts · metric data\nguardrails · pre-period covariates"]

    IN --> SRM
    IN --> PMC
    IN --> CMC
    IN --> PSC
    IN --> MDE
    IN --> PKG
    IN --> GRD
    IN --> PPB

    SRM["SRM Check\nchi-square df=1\nerfc(sqrt(x/2))"]
    PMC["Primary Metric\ntwo-proportion z-test\npooled SE under H0"]
    CMC["Continuous Metric\nWelch t-test\nWelch-Satterthwaite dof"]
    PSC["Practical Significance\nobserved lift vs\nbusiness threshold"]
    MDE["MDE Context\nobserved lift vs\nplanned MDE"]
    PKG["Peeking Risk\nduration ratio\n+ interim looks"]
    GRD["Guardrail Movement\nbad direction\n+ tolerance"]
    PPB["Pre-period Balance\nSMD per covariate\npooled SD"]

    SRM --> AGG
    PMC --> AGG
    CMC --> AGG
    PSC --> AGG
    MDE --> AGG
    PKG --> AGG
    GRD --> AGG
    PPB --> AGG

    AGG["Overall Status\nFAIL > WARN > INSUFFICIENT_INPUT > PASS"]
    AGG --> OUT

    OUT["TrialReport\nJSON · Markdown · HTML\nexplicit claim boundary"]

Why this exists

A p-value alone is not enough to ship an experiment. Before acting on a readout, teams should check whether the result is trustworthy and decision-ready:

  • Did assignment drift? (SRM — chi-square df=1)
  • Was the result called early or monitored repeatedly? (peeking risk)
  • Is the lift large enough to matter? (practical significance)
  • Is the lift below the planned MDE? (MDE context)
  • Is the continuous metric difference real? (Welch's t-test, no equal-variance assumption)
  • Did a guardrail move in the wrong direction? (guardrail movement)
  • Were variants balanced before the test? (pre-period covariate balance — SMD)

TrialCheck packages those checks into one lightweight Python library with zero dependencies.

Claim boundary

TrialCheck is an audit helper. It does not:

  • run experiments
  • assign users
  • replace an experimentation platform
  • prove causal validity
  • perform CUPED or sequential testing
  • make automatic ship/no-ship decisions

It surfaces readout risks so a data scientist, experiment owner, or analytics lead can make a better decision.

Install locally

cd trialcheck_v0
python -m pip install -e .

Quickstart

from trialcheck import TrialCheck, write_report
from trialcheck.io import load_experiment_json

experiment = load_experiment_json("sample_data/checkout_experiment_summary.json")
report = TrialCheck(experiment).run()

print(report.overall_status.value)
print(report.interpretation)

write_report(report, "outputs/trialcheck_report.json")
write_report(report, "outputs/trialcheck_report.md")
write_report(report, "outputs/trialcheck_report.html")

Run the demo

cd trialcheck_v0
set -e
python -m pip install -e .
python scripts/generate_demo_reports.py
open outputs/trialcheck_report.html

Run tests

cd trialcheck_v0
set -e
python -m unittest discover -s tests -v

Four canonical scenarios

The demo ships four pre-built scenarios that each exercise a different failure mode:

Scenario Overall What fires
clean_pass PASS All checks green; full data supplied
srm_fail FAIL 56/44 split observed vs 50/50 planned
peeking_warn WARN 36% of planned duration, 2 interim looks
guardrail_harm FAIL Revenue/user drops 4.4%; refund rate doubles

Run all four with python scripts/generate_demo_reports.py.

Public resume-safe claim

Built TrialCheck, a platform-agnostic A/B experiment readout auditor that checks completed experiment summaries for sample-ratio mismatch (chi-square), peeking risk, MDE context, practical and statistical significance (two-proportion z-test and Welch's t-test), guardrail movement, and pre-period covariate imbalance (SMD), producing JSON/Markdown/HTML audit reports with explicit decision caveats. Zero dependencies. 17 tests. 4 canonical demo scenarios.

Roadmap

  • richer power/MDE utilities
  • CSV batch audit mode
  • optional report styling polish

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trialcheck-0.2.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trialcheck-0.2.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file trialcheck-0.2.0.tar.gz.

File metadata

  • Download URL: trialcheck-0.2.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trialcheck-0.2.0.tar.gz
Algorithm Hash digest
SHA256 09451deb2497624c109545f6d272abede7efe276cdd0810e4806fc19e058bf1d
MD5 4cba1c6c7c0da9ed6fab140e0c54c6fe
BLAKE2b-256 c986911703df515e4268099bfea4cb9a2bcaed6436a160eab39b1eb0bb6fca72

See more details on using hashes here.

Provenance

The following attestation bundles were made for trialcheck-0.2.0.tar.gz:

Publisher: publish.yml on SidharthKriplani/trialcheck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file trialcheck-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: trialcheck-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for trialcheck-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40a7c2dcaa393e53f076e9681cf42ba96fb2a851228c73be9290b5c57c12dccc
MD5 fc245c5cacbd692b81191cfb8ffc9a59
BLAKE2b-256 9343b269c7ecc22744352c6affeab2e8d0aa79348c1ee957e9b949e0dd6548ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for trialcheck-0.2.0-py3-none-any.whl:

Publisher: publish.yml on SidharthKriplani/trialcheck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page