Skip to main content

Python implementation of the R package 'statcheck' used for extracting and analysing statistical tests in scientific articles.

Project description

statcheck

PyPI version

Credits

This is a python implementation of the R package statcheck (ver. 1.4.0-beta.4) published by Michèle B. Nuijten [MicheleNuijten]. The original package can by found at her Github page. The code relies heavily on Nuijten's work and is currently only a python implementation of the original package, with the goal of making it more accessible to the python community. Both packages are published under the GNU General Public License v3.0. The curent implementation is published under the MIT License. To ensure usability, all the original tests were recoded to the python version.

What is statcheck?

statcheck is a free, open source Python package that can be used to automatically extract statistical null-hypothesis significant testing (NHST) results from articles and recompute the p-values based on the reported test statistic and degrees of freedom to detect possible inconsistencies.

statcheck is mainly useful for:

  1. Self-checks: you can use statcheck to make sure your manuscript doesn’t contain copy-paste errors or other inconsistencies before you submit it to a journal.
  2. Peer review: editors and reviewers can use statcheck to check submitted manuscripts for statistical inconsistencies. They can ask authors for a correction or clarification before publishing a manuscript.
  3. Research: statcheck can be used to automatically extract statistical test results from articles that can then be analyzed. You can for instance investigate whether you can predict statistical inconsistencies (see e.g., Nuijten et al., 2017), or use it to analyze p-value distributions (see e.g., Hartgerink et al., 2016).

How does statcheck work?

The algorithm behind statcheck consists of four basic steps:

  1. Convert pdf and html articles to plain text files.
  2. Search the text for instances of NHST results. Specifically, statcheck can recognize t-tests, F-tests, correlations, z-tests, \chi^2 -tests, and Q-tests (from meta-analyses) if they are reported completely (test statistic, degrees of freedom, and p-value) and in APA style.
  3. Recompute the p-value using the reported test statistic and degrees of freedom.
  4. Compare the reported and recomputed p-value. If the reported p-value does not match the computed one, the result is marked as an inconsistency (Error in the output). If the reported p-value is significant and the computed is not, or vice versa, the result is marked as a gross inconsistency (DecisionError in the output).

statcheck takes into account correct rounding of the test statistic, and has the option to take into account one-tailed testing. See the manual for details.

Installation and use

For detailed information about installing and using statcheck, see the Documentation file in the github repository, or refer to the R documentation.

Installation

pip install statcheck

Example Usage

from statcheck.checkdir import checkPDFdir
dir = 'path/to/pdf/directory'
Res, pRes = checkPDFdir(dir, subdir = False)

# Res is a pandas dataframe with the analysis of statistical results
Res
# pRes is a pandas dataframe with extracted p-values
pRes

Running tests

pip install pytest
pytest tests/

statcheck.io is a web-based interface for statcheck.

Author of the Python implementation

** Hubert Plisiecki **

Citation

---
@misc{MicheleNuijten,  
  author = {Michèle B. Nuijten},  
  title = {statcheck},  
  year = {2021},  
  url = {{https://github.com/MicheleNuijten/statcheck}}  
}
---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statcheck-0.0.6.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statcheck-0.0.6-py3-none-any.whl (47.5 kB view details)

Uploaded Python 3

File details

Details for the file statcheck-0.0.6.tar.gz.

File metadata

  • Download URL: statcheck-0.0.6.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for statcheck-0.0.6.tar.gz
Algorithm Hash digest
SHA256 7f493ef45d445d8d08bac714a24eefa214b4e90a0ae866cccc1261b190ad5a8c
MD5 45abea2a974b1764ac52e79dc14ba6ac
BLAKE2b-256 5223a539750eaf96f1fd25f415d8755a687ce187ab0f325d58093a9f0de5a54c

See more details on using hashes here.

File details

Details for the file statcheck-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: statcheck-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 47.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for statcheck-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1396f8bc941bd8ae492b10ed5dd9b9dced3bc0ca42f8d7ac3083cf1ca28271cd
MD5 9dafe1c8655f89540cd4f4df5691c4b8
BLAKE2b-256 a319c1cf7b22b2afe9043c3b8f67b211ba3800cd00b006345a9877f6f81e6cb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page