Statistical detection tools for screening published research

These details have not been verified by PyPI

Project links

Project description

bullshit-detector

Statistical detection tools for screening published research.

"Bullshit is unavoidable whenever circumstances require someone to talk without knowing what they are talking about." — Harry Frankfurt, On Bullshit (2005)

What this is

A Python toolkit for systematically screening research papers for statistical red flags. Organized in four tiers, from quick API lookups to deep data analysis. Developed from petroleum geoscience applications but applicable to any field where correlations, p-values, and sample sizes are reported.

Tier	What it checks	What you need	Time
0 — Paper screening	Journal legitimacy, retractions, author credentials, ML explainability screening	DOI, journal name	Minutes
0 — HARKing screening	Hypothesis framing, breakpoint claims (text)	Paper text	Minutes
1 — Arithmetic	p-value consistency, GRIM/GRIMMER tests	Reported statistics	Minutes
2 — Plausibility	Spurious correlations, critical r, confidence intervals, cluster reification (unsupervised learning audit)	Summary stats (r, n, k)	Minutes
2/3 — HARKing breakpoint	Breakpoint stability (bootstrap CI, Davies test)	Raw/digitized data	Minutes–Hours
3 — Data analysis	Outlier leverage, distance correlation, reproducibility, SHAP/XAI model interpretation audit	Raw/digitized data	Hours

Installation

pip install bullshit-detector          # Core (Tiers 0–2)
pip install bullshit-detector[full]    # + Tier 3 tools (statsmodels, seaborn)
pip install bullshit-detector[batch]   # + statcheck for PDF batch scanning (GPL-3.0)
pip install bullshit-detector[dev]     # + pytest for development

Quick Start

Is the reported p-value correct?

from bullshit_detector.p_checker import check_p_value

check_p_value("t", 2.20, 28, reported_p=0.04)
# {'computed_p': 0.0362254847788378, 'reported_p': 0.04,
#  'consistent': True, 'decision_error': False, ...}

Reported p=0.04 is consistent with computed p=0.036 — within rounding tolerance.

Could this correlation be spurious?

from bullshit_detector.spurious import P_spurious

# r=0.60, n=5 observations, k=10 predictor variables tested
P_spurious(0.60, 5, 10)
# 0.9649622440458044

With r=0.60, n=5 observations, and k=10 variables tested, there is a 96.5% probability this correlation is spurious.

Has this paper been retracted?

from bullshit_detector.paper_screening import check_retraction

check_retraction("10.2147/DMSO.S27665")
# {'retracted': True, 'corrections': [], 'pubpeer_comments': 0,
#  'pubpeer_url': 'https://pubpeer.com/publications/10.2147-DMSO.S27665'}

retracted: True — the paper behind this DOI has been formally retracted. Do not cite or act on its findings. (This is the Vinson et al. 2012 green coffee extract paper, retracted in 2014 after an FTC investigation.)

Intellectual foundations

This project stands on the shoulders of:

Carl T. Bergstrom & Jevin D. West — Calling Bullshit: The Art of Skepticism in a Data-Driven World (Random House, 2020) and their University of Washington course. The paper-level screening module (Tier 0) directly implements their legitimacy framework.
Harry Frankfurt — On Bullshit (Princeton University Press, 2005). Established the philosophical foundation.

Lawrence Weinstein & John A. Adam — Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin (Princeton University Press, 2008); and Lawrence Weinstein — Guesstimation 2.0: Solving Today's Problems on the Back of a Cocktail Napkin (Princeton University Press, 2012). Inspired the Fermi estimation sanity checks.

C.T. Kalkomey — "Potential risks when using seismic attributes as predictors of reservoir properties" (The Leading Edge, 1997). The spurious correlation probability formula.
N.J.L. Brown & J.A.J. Heathers — "The GRIM Test" (SPPS, 2017). Arithmetic consistency checking for means.
Aurélien Allard — "Analytic-GRIMMER" (2018). Extended GRIM to standard deviations. The Python port in this package is the first on PyPI.
Kristin Sainani — "How to Be a Statistical Detective" (PM&R, 2020). Pedagogical framework tying these tools together.
Thomas Speidel — GeoConvention 2018 R notebook. Variable selection methods (redundancy analysis, LASSO, sparse PCA, power analysis) on the Hunt dataset, complementing Niccoli's Python implementations. Inspired the redundancy and power modules.
Michèle Nuijten et al. — statcheck (Behavioral Research Methods, 2016). P-value recomputation methodology.

Paper audits

The audits/ directory contains worked examples of the full framework applied to real papers with known statistical issues:

Carney, Cuddy & Yap (2010) — Power posing. Underpowered, multiple uncorrected outcomes, implausible effect sizes. Verdict: REVIEW.
Wansink et al. (2014–2016) — Pizza buffet papers. Impossible descriptive statistics caught by GRIMMER. Verdict: REJECT.
ORBITA / Al-Lamee et al. (2018) — PCI for stable angina. Adequately powered for design target, but CI compatible with meaningful benefit. Widely over-interpreted as "PCI doesn't work." Verdict: CAUTION.

For AI assistants

The skills/ directory contains detection heuristics and decision trees for each module. If you're a coding assistant (Copilot, Claude Code, etc.), read skills/OVERVIEW.md first.

Acknowledgments

Kristin Sainani — her paper "How to Be a Statistical Detective" (PM&R, 2020, 12(2):211–215, DOI: 10.1002/pmrj.12305) inspired the Tier 1 arithmetic consistency approach and the overall "statistical detective" framing of this package. The p_checker module's pedagogical structure follows her framework of treating statistical anomalies as clues that warrant further investigation. More broadly, her detective framing runs through the entire package lineage: it directly inspired Matteo Niccoli's "Be a geoscience and data science detective" project (MyCarta blog, GitHub repo, TRANSFORM 2021 lightning talk), which is the primary source for the Tier 3 modules (leverage.py, reproducibility.py).

Thomas Speidel — his GeoConvention 2018 R notebook, Data Science Tools for Petroleum Exploration and Production, provided the methodology for the power analysis and redundancy modules (Tier 2). The original GeoConvention 2018 presentation was a collaboration between Matteo Niccoli and Thomas Speidel; Speidel's R implementations of power analysis and variable redundancy (Hmisc::redun), applied to the Hunt (2013) 21-well dataset, were translated into the Python power and redundancy modules in this package.

License

Apache-2.0. The optional statcheck dependency is GPL-3.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 29, 2026

0.2.2

Mar 9, 2026

0.2.1

Mar 5, 2026

0.2.0

Mar 5, 2026

0.1.1

Feb 27, 2026

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bullshit_detector-0.3.0.tar.gz (42.2 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bullshit_detector-0.3.0-py3-none-any.whl (34.5 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file bullshit_detector-0.3.0.tar.gz.

File metadata

Download URL: bullshit_detector-0.3.0.tar.gz
Upload date: Mar 29, 2026
Size: 42.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bullshit_detector-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`5587dbcd772549c657b8e976308d68b7c3e60475b1c41a7c0ac5906805657738`
MD5	`a9881efa06ad2f15e5743da386e0e2a1`
BLAKE2b-256	`1147bffe5da1fb3f38bf9505deb442b2103c272b1ec38e7c3d05a8634da742a8`

See more details on using hashes here.

File details

Details for the file bullshit_detector-0.3.0-py3-none-any.whl.

File metadata

Download URL: bullshit_detector-0.3.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 34.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bullshit_detector-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`78b42669f57680878f1dc939e3711a5a14ac2b20e5bd98d303dcf75572033b13`
MD5	`88aed5a8be06b071ec4523777eb40d2b`
BLAKE2b-256	`efb5341cc994f3adf9315ba343753f73e2e011a78ce2f56b5618de251dbb134a`

See more details on using hashes here.

bullshit-detector 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bullshit-detector

What this is

Installation

Quick Start

Is the reported p-value correct?

Could this correlation be spurious?

Has this paper been retracted?

Intellectual foundations

Paper audits

For AI assistants

Acknowledgments

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes