Statistical validity auditor for A/B tests — because significant != trustworthy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aldair_ai

These details have not been verified by PyPI

Project description

abaudit

Statistical Validity Auditor for A/B Tests

A significant p-value answers the wrong question.
abaudit asks: given that the result is significant, how likely is it to actually be real?

Why abaudit?

Every A/B testing tool tells you whether your result is significant.
None of them tell you whether to trust it.

A p-value is P(data | no effect) — the probability of seeing your data if there's no effect.
What you actually want is P(true effect | significant result) — the Positive Predictive Value (PPV).

These are not the same thing. With a low prior, multiple metrics tested, and a few interim peeks, a p = 0.03 result might only have a 20% chance of being real. Standard tools report it as significant and move on. abaudit doesn't.

The math comes from Ioannidis (2005):

$$\text{PPV} = \frac{(1-\beta) \cdot f}{(1-\beta) \cdot f + \alpha \cdot (1-f)}$$

Where $f$ is your prior probability the effect exists, $1-\beta$ is power, and $\alpha$ is your significance threshold. This is Bayes' rule applied to hypothesis testing — and it's what p-values completely ignore.

Quickstart

pip install abaudit

import numpy as np
import abaudit as ab

rng  = np.random.default_rng(42)
ctrl = rng.normal(0.0, 1.0, 500)
trt  = rng.normal(0.3, 1.0, 500)

result = ab.audit(
    ctrl, trt,
    prior_f = 0.2,
    metrics = ['conversion', 'revenue', 'session_time'],
    n_peeks = 5,
)

result.summary()

         abaudit — Experiment Validity Report
┌──────────────────────────────────┬─────────────────┬────────┐
│ Check                            │ Result          │ Status │
├──────────────────────────────────┼─────────────────┼────────┤
│ p-value (primary)                │ 0.0000          │ ✅     │
│ p-value (Bonferroni corrected)   │ 0.0001          │ ✅     │
│ PPV — prob. effect is real       │ 0.83            │ ✅     │
│ Statistical power                │ 0.99            │ ✅     │
│ Sample Ratio Mismatch            │ p = 1.000       │ ✅     │
│ Metrics tested                   │ 3               │ ⚠️     │
│ Optional stopping (peeks)        │ eff. α = 0.226  │ ⚠️     │
│ Effect size (Cohen's d)          │ 0.271           │ ✅     │
└──────────────────────────────────┴─────────────────┴────────┘

Bias score: [███░░░░░░░░░░░░░░░░░] 0.15 / 1.0  🟢 Low concern

⚠️  Warnings:
   • 3 metrics tested — Bonferroni-corrected p = 0.0001 (raw p = 0.0000).
   • Optional stopping risk: p-value checked 5 times. Effective α ≈ 0.226 (nominal: 0.05).

💡 Recommendations:
   • Use sequential testing (SPRT) or an alpha-spending function
     when interim looks are necessary.

# Save a shareable HTML report
ab.generate_report(result, path="audit_report.html")

# Pre-experiment: is this worth running?
plan = ab.design_summary(effect_size=0.3, prior_f=0.2)
plan.summary()

# During-experiment: health checks
ab.check_srm(n_control=4850, n_treatment=5150)
ab.check_optional_stopping([0.12, 0.08, 0.04, 0.06, 0.03])

What abaudit checks

Module	Check	Answers
`validity`	PPV (Ioannidis 2005)	Given the significant result, what's the probability it's real?
`validity`	Multiple metric correction	You tested 3 things — what's the Bonferroni-corrected p?
`validity`	Effect size plausibility	Is the reported effect suspiciously large (winner's curse)?
`validity`	Statistical power	Was the study large enough to detect the effect reliably?
`runtime`	Sample Ratio Mismatch	Was traffic split as intended?
`runtime`	Optional stopping	Was the p-value checked multiple times during collection?
`runtime`	Novelty effect	Did the effect fade after the initial novelty wore off?
`design`	PPV-aware power analysis	How large does n need to be so results are actually trustworthy?
`report`	HTML report	Self-contained report for sharing with stakeholders

What abaudit gives you that standard tools don't

Standard tool	abaudit
Reports p-value	Reports p-value and PPV
Ignores your prior	Uses Ioannidis PPV framework
Ignores multiple metrics	Applies Bonferroni correction automatically
Ignores peeking	Diagnoses optional stopping and inflated α
Ignores traffic split	Runs Sample Ratio Mismatch test
No composite score	Bias score 0–1 with breakdown
No HTML output	Self-contained shareable report

Full API

import abaudit as ab

# ── Post-experiment audit ─────────────────────────────────────
result = ab.audit(
    control        = ctrl,          # array-like, control group
    treatment      = trt,           # array-like, treatment group
    prior_f        = 0.2,           # prior probability effect is real
    alpha          = 0.05,          # significance threshold
    metrics        = ['conversion', 'revenue'],  # all metrics tested
    primary        = 'conversion',  # the one being reported
    n_peeks        = 3,             # number of interim looks
    expected_split = 0.5,           # intended traffic split
)
result.summary()                    # traffic-light table
result.ppv                          # float: prob. effect is real
result.bias_score                   # float 0–1: composite red flags
result.flags                        # list[str]: warnings
ab.generate_report(result, "report.html")

# ── Pre-experiment planning ───────────────────────────────────
plan = ab.design_summary(
    effect_size  = 0.3,             # expected Cohen's d
    prior_f      = 0.2,             # prior probability
    target_power = 0.80,
    target_ppv   = 0.80,
)
plan.summary()
plan.n_recommended                  # n per group to achieve both targets

ab.power_analysis(effect_size=0.3)
ab.ppv_given_design(effect_size=0.3, n_per_group=176, prior_f=0.2)
ab.minimum_trustworthy_n(effect_size=0.3, prior_f=0.2, target_ppv=0.80)

# ── During-experiment checks ──────────────────────────────────
ab.check_srm(n_control=4850, n_treatment=5150)
ab.check_optional_stopping(p_value_history=[0.12, 0.08, 0.04])
ab.check_novelty_effect(
    early_control, early_treatment,
    late_control,  late_treatment,
)

Demo notebook

See examples/demo.ipynb for a complete end-to-end walkthrough: a realistic e-commerce A/B test from experiment design to HTML audit report, with visualizations of PPV vs. prior, peeking inflation, and the bias score breakdown.

Statistical foundation

Ioannidis, J.P.A. (2005). Why Most Published Research Findings Are False. PLOS Medicine 2(8): e124.
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology. Psychological Science 22(11).
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments. Cambridge University Press.
Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate. JRSS-B 57(1).

Development

git clone https://github.com/aldair-ai/abaudit.git
cd abaudit
pip install -e ".[dev]"
pytest tests/ -v

Phase	Module	Tests	Status
0	Scaffold + `_stats.py`	27	✅ Complete
1	`validity.py` — core audit	42	✅ Complete
2	`design.py` — pre-experiment	35	✅ Complete
3	`runtime.py` — health checks	35	✅ Complete
4	`report.py` — HTML reports	11	✅ Complete

Total: 184 tests · 99% coverage · Python 3.9 – 3.12

License

MIT © Edwin Aldair Espinoza Zegarra

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

aldair_ai

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 21, 2026

0.1.1

May 20, 2026

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abaudit-0.1.2.tar.gz (722.8 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

abaudit-0.1.2-py3-none-any.whl (27.7 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file abaudit-0.1.2.tar.gz.

File metadata

Download URL: abaudit-0.1.2.tar.gz
Upload date: May 21, 2026
Size: 722.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abaudit-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`686b3ffb2ae3c067c7732e4ce377528dc0b3ee535d25946810c382828e124aa1`
MD5	`e4a85db5c7151c3f5f742a08a73f8c7f`
BLAKE2b-256	`59c53f1614f95f9e33a11ef2c056d5ac8c16b0a626d4b68ff80ff41baa555c6d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for abaudit-0.1.2.tar.gz:

Publisher: publish.yml on aldair-ai/abaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: abaudit-0.1.2.tar.gz
- Subject digest: 686b3ffb2ae3c067c7732e4ce377528dc0b3ee535d25946810c382828e124aa1
- Sigstore transparency entry: 1589695708
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: aldair-ai/abaudit@52bc1512f758427dc913e738079afe832d11df6d
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/aldair-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@52bc1512f758427dc913e738079afe832d11df6d
- Trigger Event: release

File details

Details for the file abaudit-0.1.2-py3-none-any.whl.

File metadata

Download URL: abaudit-0.1.2-py3-none-any.whl
Upload date: May 21, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abaudit-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5355ddd95dea6b99cea81f2a41bfacd3ed55244e709b6f343a649ffc354483f`
MD5	`c479859ff3a5a235fd421fec3f816631`
BLAKE2b-256	`3301624cd9a26a436a25db428a96cc9287a97b504eec1058e12c321bfac6981e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for abaudit-0.1.2-py3-none-any.whl:

Publisher: publish.yml on aldair-ai/abaudit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: abaudit-0.1.2-py3-none-any.whl
- Subject digest: c5355ddd95dea6b99cea81f2a41bfacd3ed55244e709b6f343a649ffc354483f
- Sigstore transparency entry: 1589695780
- Sigstore integration time: May 21, 2026
Source repository:
- Permalink: aldair-ai/abaudit@52bc1512f758427dc913e738079afe832d11df6d
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/aldair-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@52bc1512f758427dc913e738079afe832d11df6d
- Trigger Event: release

abaudit 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

abaudit

Why abaudit?

Quickstart

What abaudit checks

What abaudit gives you that standard tools don't

Full API

Demo notebook

Statistical foundation

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance