Python implementation of the R package 'statcheck' used for extracting and analysing statistical tests in scientific articles.
Project description
statcheck
Credits
This is a python implementation of the R package statcheck
(ver. 1.4.0-beta.4) published by Michèle B. Nuijten [MicheleNuijten]. The original package can by found at her Github
page. The code relies heavily on Nuijten's work and is currently only a python implementation of the original package, with the goal of making it more accessible to the
python community. The original package was published under the GNU General Public License v3.0. The curent implementation is published under the MIT
License. To ensure usability, all the original tests were recoded to the python version.
What is statcheck?
statcheck
is a free, open source Python package that can be used to
automatically extract statistical null-hypothesis significant testing
(NHST) results from articles and recompute the p-values based on the
reported test statistic and degrees of freedom to detect possible
inconsistencies.
statcheck
is mainly useful for:
- Self-checks: you can use
statcheck
to make sure your manuscript doesn’t contain copy-paste errors or other inconsistencies before you submit it to a journal. - Peer review: editors and reviewers can use
statcheck
to check submitted manuscripts for statistical inconsistencies. They can ask authors for a correction or clarification before publishing a manuscript. - Research:
statcheck
can be used to automatically extract statistical test results from articles that can then be analyzed. You can for instance investigate whether you can predict statistical inconsistencies (see e.g., Nuijten et al., 2017), or use it to analyze p-value distributions (see e.g., Hartgerink et al., 2016).
How does statcheck work?
The algorithm behind statcheck
consists of four basic steps:
- Convert pdf and html articles to plain text files.
- Search the text for instances of NHST results. Specifically,
statcheck
can recognize t-tests, F-tests, correlations, z-tests, -tests, and Q-tests (from meta-analyses) if they are reported completely (test statistic, degrees of freedom, and p-value) and in APA style. - Recompute the p-value using the reported test statistic and degrees of freedom.
- Compare the reported and recomputed p-value. If the reported
p-value does not match the computed one, the result is marked as
an inconsistency (
Error
in the output). If the reported p-value is significant and the computed is not, or vice versa, the result is marked as a gross inconsistency (DecisionError
in the output).
statcheck
takes into account correct rounding of the test statistic,
and has the option to take into account one-tailed testing. See the
manual for details.
Installation and use
For detailed information about installing and using statcheck
, see the
Documentation file in the github repository, or refer to the R documentation.
Installation
pip install statcheck
Example Usage
from statcheck.checkdir import checkPDFdir
dir = 'path/to/pdf/directory'
Res, pRes = checkPDFdir(dir, subdir = False)
# Res is a pandas dataframe with the analysis of statistical results
Res
# pRes is a pandas dataframe with extracted p-values
pRes
Running tests
pip install pytest
pytest tests/
statcheck.io is a web-based interface for statcheck.
Author of the Python implementation
** Hubert Plisiecki **
Citation
---
@misc{MicheleNuijten,
author = {Michèle B. Nuijten},
title = {statcheck},
year = {2021},
url = {{https://github.com/MicheleNuijten/statcheck}}
}
---
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file statcheck-0.0.4.tar.gz
.
File metadata
- Download URL: statcheck-0.0.4.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19d05b6d6ceeb8202d554f57f7399e17750b7294bda76c979a75254245774e7f |
|
MD5 | 1a6c396e6f750e4f789730e3c48a788a |
|
BLAKE2b-256 | 06e2f85b582903a5c5d70435e4bf53943ca894657ee4873902384ff6a7b6f582 |
File details
Details for the file statcheck-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: statcheck-0.0.4-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9169ab7227bbbaf229c5fe5c049f7a6b5bcbee1064d741a8cffe72ae49935e49 |
|
MD5 | 82faa7499c13accb45b6fd030338109c |
|
BLAKE2b-256 | f5438768a91ac024c5fe31f1468f97680d037dcf67cc4014c4896fb3ee57a75b |