An autonomous data-analysis agent that red-teams its own conclusions and reports what it cannot prove.
Project description
statskeptic
A data-analysis agent that red-teams its own conclusions.
Give it a dataset and a question. statskeptic profiles the data, picks a vetted statistical method, runs it, and then turns on the result: it attacks its own analysis against a methodological rubric (assumption violations, multiple comparisons, confounding, underpowered samples, data leakage, outlier sensitivity), revises what it can, and reports what the data shows and what it cannot conclude.
Two rules make it different from the fluent-but-wrong tools it competes with:
- The model never produces a number. Every statistic comes from real, tested code (scipy / statsmodels) and ships with the exact call that produced it, so any figure can be re-run and checked. statskeptic selects methods and interprets them; it does not invent them.
- "Cannot conclude" is a success state. Over-claiming is the cardinal sin here. When the data does not support a reliable answer, statskeptic says so plainly, and a non-zero exit code lets a pipeline act on it.
A trap a naive tool walks into
examples/skewed_trial.csv is a two-arm trial where recovery time is heavily
right-skewed and there is no real difference between the arms. Point a tool that reaches
straight for a t-test at it and you get a confident false positive: p = 0.014,
"significant," ship it.
$ statskeptic analyze examples/skewed_trial.csv -q "Does the drug reduce recovery hours?"
## Mann-Whitney U
comparing 'recovery_hours' across 'arm': two groups, so a t-test is the usual first pass
- Result: U = 814, p = 0.110 (not significant at alpha=0.05)
- Effect: rank_biserial_r = -0.196
- location shift (drug - placebo): 95% CI [-17.5, 1.1]
- n = 90
### Revisions
- Switched from Student's t-test to Mann-Whitney U (assumption.normality): data is
non-normal; the rank-based test is valid here. p 0.014 -> 0.110.
### Objections raised
- None outstanding.
## What this cannot conclude
- Nothing beyond the assumptions and caveats noted above.
statskeptic planned the same t-test a careful analyst would reach for first, then its normality check fired, the revision loop switched to the rank-based test, and the "significant" result evaporated. The audit trail shows the switch and the p-value before and after. The false positive never leaves the building.
What it catches
Each objection is grounded in the actual numbers and carries a concrete remedy. Some are fixed automatically by re-running; others can only be flagged, and those push the verdict toward "cannot conclude."
| Objection | What fires it | What statskeptic does |
|---|---|---|
| Non-normality | Shapiro plus a real skew magnitude, not a trivial deviation | switch to the rank test (Mann-Whitney, Kruskal-Wallis, Spearman) |
| Unequal variance | Levene on a pooled-variance t-test | switch to Welch's t-test |
| Sparse contingency cells | expected counts below Cochran's threshold | switch a 2x2 to Fisher's exact test |
| Multiple comparisons | many tests run against one outcome | apply a Holm correction and re-read significance |
| Confounding | a causal question on observational data | name a candidate confounder; refuse the causal claim |
| Low power | a non-significant result where only a large effect was detectable | report the minimum detectable effect; refuse to read "no effect" |
| Data leakage | an identifier used as a predictor | drop it and re-fit |
| Outlier sensitivity | dropping extreme points flips significance | switch to a rank-based test |
The vetted toolset covers two-group comparisons (Student's t, Welch, Mann-Whitney), k-group comparisons (one-way ANOVA, Kruskal-Wallis), association (Pearson, Spearman, chi-square, Fisher's exact), and regression (OLS, logistic). Each routine reports an effect size and, where one is defined, a confidence interval, and lists the assumptions it checked against your data.
Install
git clone https://github.com/Burton-David/statskeptic
cd statskeptic
pip install -e .
Python 3.10 or newer. The core needs no API key and makes no network calls.
Usage
statskeptic analyze data.csv --question "Does the treatment change recovery?"
The reader detects the file's dialect (delimiter, quoting, encoding) with CleverCSV, so semicolon-delimited, tab-delimited, or non-UTF-8 files load as the table they actually are rather than a single mangled column, and infinities are treated as missing data.
Options:
--jsonemits the full typed report, every number traceable to its computation.--outcome,--group/--by,--predictorsname columns when the question is ambiguous (the planner declines rather than guess).--alphasets the significance level (default 0.05).--quietsuppresses the report body and returns only the exit code.
Exit codes make it scriptable as a gate:
| code | meaning |
|---|---|
| 0 | a defensible result (with caveats counts as defensible) |
| 2 | the data cannot support a reliable answer |
| 3 | the question does not map to a vetted method |
| 64 | usage error (bad flags, missing file, unknown column) |
| 70 | a statistical routine failed and the cause is reported, not hidden |
As a library:
from statskeptic import analyze
report = analyze("data.csv", "Does exercise cause better health?")
print(report.explain()) # markdown
report.to_json() # the full typed report
report.verdict # defensible / defensible_with_caveats / cannot_conclude / declined
Try the planted-trap corpus
examples/ ships five datasets, each with one planted flaw, generated by a seeded script
so the numbers above are reproducible (python examples/make_demo_data.py):
statskeptic analyze examples/biomarker_screen.csv -q "Which markers are associated with the outcome?"
statskeptic analyze examples/exercise_health.csv -q "Does more exercise cause a better health score?"
statskeptic analyze examples/small_trial.csv -q "Does the treatment change the test score?"
statskeptic analyze examples/clean_ab_test.csv -q "Does the variant change order value?"
The biomarker screen finds 4 markers significant at p<0.05, then a Holm correction
across the 24 tests leaves only the one real signal standing. The exercise question
reports a strong correlation and still refuses to call it causal, naming age as the
likely confounder. The small trial returns "cannot conclude": at nine per arm, only a
large effect was ever detectable. The clean A/B test returns a plain, defensible yes.
Honest limits
- Causal critique is a flag, not an engine. statskeptic names a candidate confounder and declines the causal claim; it does not estimate causal effects.
- The rule-based planner maps a question to a method by keywords and column structure. It
declines ambiguous questions rather than guess, so you may need
--outcome/--groupto point it at the right columns. - Independence is assumed and stated, not tested. It is a property of the study design, which the data alone cannot reveal.
- An optional LLM critic (for context-specific objections the static rubric cannot encode) and clinical / financial domain packs are planned extensions, not yet shipped. The check registry and the planner are built as the seams for them.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file statskeptic-0.1.0.tar.gz.
File metadata
- Download URL: statskeptic-0.1.0.tar.gz
- Upload date:
- Size: 74.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f1817c57287dbab8404ddf682438ed0a85e3d1e5f9057b197072ac44d561b2e
|
|
| MD5 |
e132f3b28cbc0e064111b0fad8456d6b
|
|
| BLAKE2b-256 |
a43c3a1d224c1f827094ef666e110d488efd05dfc5a1abbfb0a87cd588f098bf
|
Provenance
The following attestation bundles were made for statskeptic-0.1.0.tar.gz:
Publisher:
publish.yml on Burton-David/statskeptic
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statskeptic-0.1.0.tar.gz -
Subject digest:
0f1817c57287dbab8404ddf682438ed0a85e3d1e5f9057b197072ac44d561b2e - Sigstore transparency entry: 1700125471
- Sigstore integration time:
-
Permalink:
Burton-David/statskeptic@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Burton-David
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file statskeptic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: statskeptic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e0fd515b7c3aa112871d0752e7aef6cd7d34f4b5aec514279be1f29dfe551ce
|
|
| MD5 |
3d29a3cfb3b2e02ed3b4ca34653b593b
|
|
| BLAKE2b-256 |
a0c103c4fc0322d96c72828c2e057b326e49d668ed43dffa44186b7a2cf49169
|
Provenance
The following attestation bundles were made for statskeptic-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Burton-David/statskeptic
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statskeptic-0.1.0-py3-none-any.whl -
Subject digest:
8e0fd515b7c3aa112871d0752e7aef6cd7d34f4b5aec514279be1f29dfe551ce - Sigstore transparency entry: 1700125560
- Sigstore integration time:
-
Permalink:
Burton-David/statskeptic@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Burton-David
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3 -
Trigger Event:
release
-
Statement type: