An autonomous data-analysis agent that red-teams its own conclusions and reports what it cannot prove.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

databurton

These details have not been verified by PyPI

Project description

statskeptic

A data-analysis agent that red-teams its own conclusions.

Give it a dataset and a question. statskeptic profiles the data, picks a vetted statistical method, runs it, and then turns on the result: it attacks its own analysis against a methodological rubric (assumption violations, multiple comparisons, confounding, underpowered samples, data leakage, outlier sensitivity), revises what it can, and reports what the data shows and what it cannot conclude.

Two rules make it different from the fluent-but-wrong tools it competes with:

The model never produces a number. Every statistic comes from real, tested code (scipy / statsmodels) and ships with the exact call that produced it, so any figure can be re-run and checked. statskeptic selects methods and interprets them; it does not invent them.
"Cannot conclude" is a success state. Over-claiming is the cardinal sin here. When the data does not support a reliable answer, statskeptic says so plainly, and a non-zero exit code lets a pipeline act on it.

A trap a naive tool walks into

examples/skewed_trial.csv is a two-arm trial where recovery time is heavily right-skewed and there is no real difference between the arms. Point a tool that reaches straight for a t-test at it and you get a confident false positive: p = 0.014, "significant," ship it.

$ statskeptic analyze examples/skewed_trial.csv -q "Does the drug reduce recovery hours?"

## Mann-Whitney U
comparing 'recovery_hours' across 'arm': two groups, so a t-test is the usual first pass

- Result: U = 814, p = 0.110 (not significant at alpha=0.05)
- Effect: rank_biserial_r = -0.196
- location shift (drug - placebo): 95% CI [-17.5, 1.1]
- n = 90

### Revisions
- Switched from Student's t-test to Mann-Whitney U (assumption.normality): data is
  non-normal; the rank-based test is valid here. p 0.014 -> 0.110.

### Objections raised
- None outstanding.

## What this cannot conclude
- Nothing beyond the assumptions and caveats noted above.

statskeptic planned the same t-test a careful analyst would reach for first, then its normality check fired, the revision loop switched to the rank-based test, and the "significant" result evaporated. The audit trail shows the switch and the p-value before and after. The false positive never leaves the building.

What it catches

Each objection is grounded in the actual numbers and carries a concrete remedy. Some are fixed automatically by re-running; others can only be flagged, and those push the verdict toward "cannot conclude."

Objection	What fires it	What statskeptic does
Non-normality	Shapiro plus a real skew magnitude, not a trivial deviation	switch to the rank test (Mann-Whitney, Kruskal-Wallis, Spearman)
Unequal variance	Levene on a pooled-variance t-test	switch to Welch's t-test
Sparse contingency cells	expected counts below Cochran's threshold	switch a 2x2 to Fisher's exact test
Multiple comparisons	many tests run against one outcome	apply a Holm correction and re-read significance
Confounding	a causal question on observational data	name a candidate confounder; refuse the causal claim
Low power	a non-significant result where only a large effect was detectable	report the minimum detectable effect; refuse to read "no effect"
Data leakage	an identifier used as a predictor	drop it and re-fit
Outlier sensitivity	dropping extreme points flips significance	switch to a rank-based test

The vetted toolset covers two-group comparisons (Student's t, Welch, Mann-Whitney), k-group comparisons (one-way ANOVA, Kruskal-Wallis), association (Pearson, Spearman, chi-square, Fisher's exact), and regression (OLS, logistic). Each routine reports an effect size and, where one is defined, a confidence interval, and lists the assumptions it checked against your data.

Install

git clone https://github.com/Burton-David/statskeptic
cd statskeptic
pip install -e .

Python 3.10 or newer. The core needs no API key and makes no network calls.

Usage

statskeptic analyze data.csv --question "Does the treatment change recovery?"

The reader detects the file's dialect (delimiter, quoting, encoding) with CleverCSV, so semicolon-delimited, tab-delimited, or non-UTF-8 files load as the table they actually are rather than a single mangled column, and infinities are treated as missing data.

Options:

--json emits the full typed report, every number traceable to its computation.
--outcome, --group / --by, --predictors name columns when the question is ambiguous (the planner declines rather than guess).
--alpha sets the significance level (default 0.05).
--quiet suppresses the report body and returns only the exit code.

Exit codes make it scriptable as a gate:

code	meaning
0	a defensible result (with caveats counts as defensible)
2	the data cannot support a reliable answer
3	the question does not map to a vetted method
64	usage error (bad flags, missing file, unknown column)
70	a statistical routine failed and the cause is reported, not hidden

As a library:

from statskeptic import analyze

report = analyze("data.csv", "Does exercise cause better health?")
print(report.explain())     # markdown
report.to_json()            # the full typed report
report.verdict              # defensible / defensible_with_caveats / cannot_conclude / declined

Try the planted-trap corpus

examples/ ships five datasets, each with one planted flaw, generated by a seeded script so the numbers above are reproducible (python examples/make_demo_data.py):

statskeptic analyze examples/biomarker_screen.csv  -q "Which markers are associated with the outcome?"
statskeptic analyze examples/exercise_health.csv   -q "Does more exercise cause a better health score?"
statskeptic analyze examples/small_trial.csv       -q "Does the treatment change the test score?"
statskeptic analyze examples/clean_ab_test.csv     -q "Does the variant change order value?"

The biomarker screen finds 4 markers significant at p<0.05, then a Holm correction across the 24 tests leaves only the one real signal standing. The exercise question reports a strong correlation and still refuses to call it causal, naming age as the likely confounder. The small trial returns "cannot conclude": at nine per arm, only a large effect was ever detectable. The clean A/B test returns a plain, defensible yes.

Honest limits

Causal critique is a flag, not an engine. statskeptic names a candidate confounder and declines the causal claim; it does not estimate causal effects.
The rule-based planner maps a question to a method by keywords and column structure. It declines ambiguous questions rather than guess, so you may need --outcome / --group to point it at the right columns.
Independence is assumed and stated, not tested. It is a property of the study design, which the data alone cannot reveal.
An optional LLM critic (for context-specific objections the static rubric cannot encode) and clinical / financial domain packs are planned extensions, not yet shipped. The check registry and the planner are built as the seams for them.

License

MIT.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

databurton

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statskeptic-0.1.0.tar.gz (74.8 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

statskeptic-0.1.0-py3-none-any.whl (53.9 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file statskeptic-0.1.0.tar.gz.

File metadata

Download URL: statskeptic-0.1.0.tar.gz
Upload date: Jun 2, 2026
Size: 74.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for statskeptic-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0f1817c57287dbab8404ddf682438ed0a85e3d1e5f9057b197072ac44d561b2e`
MD5	`e132f3b28cbc0e064111b0fad8456d6b`
BLAKE2b-256	`a43c3a1d224c1f827094ef666e110d488efd05dfc5a1abbfb0a87cd588f098bf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for statskeptic-0.1.0.tar.gz:

Publisher: publish.yml on Burton-David/statskeptic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: statskeptic-0.1.0.tar.gz
- Subject digest: 0f1817c57287dbab8404ddf682438ed0a85e3d1e5f9057b197072ac44d561b2e
- Sigstore transparency entry: 1700125471
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: Burton-David/statskeptic@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Burton-David
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3
- Trigger Event: release

File details

Details for the file statskeptic-0.1.0-py3-none-any.whl.

File metadata

Download URL: statskeptic-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 53.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for statskeptic-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e0fd515b7c3aa112871d0752e7aef6cd7d34f4b5aec514279be1f29dfe551ce`
MD5	`3d29a3cfb3b2e02ed3b4ca34653b593b`
BLAKE2b-256	`a0c103c4fc0322d96c72828c2e057b326e49d668ed43dffa44186b7a2cf49169`

See more details on using hashes here.

Provenance

The following attestation bundles were made for statskeptic-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Burton-David/statskeptic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: statskeptic-0.1.0-py3-none-any.whl
- Subject digest: 8e0fd515b7c3aa112871d0752e7aef6cd7d34f4b5aec514279be1f29dfe551ce
- Sigstore transparency entry: 1700125560
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: Burton-David/statskeptic@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Burton-David
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8c1ca56e55eba5d96e76607f34f2eb69afd6d9f3
- Trigger Event: release

statskeptic 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

statskeptic

A trap a naive tool walks into

What it catches

Install

Usage

Try the planted-trap corpus

Honest limits

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance