Publication-ready statistical testing framework with 23 tests, effect sizes, power analysis, and MCP server
Project description
SciTeX Stats (scitex-stats)
Publication-ready statistical testing with 23 tests, effect sizes, power analysis, and APA formatting
Full Documentation · uv pip install "scitex-stats[all]"
Problem and Solution
| # | Problem | Solution |
|---|---|---|
| 1 | Bare scipy.stats returns (statistic, p) — effect size, CI, normality check, power each need manual follow-up calls. |
One call, one dict — ss.run_test("ttest_ind", g1, g2) returns statistic, p, Cohen's d, power, and an APA string in a unified result dict. |
| 2 | Test selection requires expertise — parametric vs non-parametric, paired vs independent, one-way vs repeated-measures. | Auto-recommend — ss.recommend_tests(StatContext(...)) ranks the appropriate tests from the design alone. |
| 3 | APA formatting is manual — every paper spells out t(58) = 2.34, p = .021, d = 0.60 by hand. |
result["formatted"] — APA / Nature / LaTeX strings live on the same result dict every test returns. |
Quick Start
import numpy as np
import scitex_stats as ss
rng = np.random.default_rng(42)
group1 = rng.normal(0.0, 1.0, 30)
group2 = rng.normal(0.5, 1.0, 30)
# Unified test API — same dict shape for every one of the 23 tests
result = ss.run_test("ttest_ind", data=group1, data2=group2)
assert result["stat_symbol"] == "t"
assert result["effect_size_metric"] == "Cohen's d"
assert result["significant"] is True
print(result["formatted"])
# → t = -3.2101, p = 0.0022, Cohen's d = -0.829, **
Unified result dictionary (every test returns this shape)
{
"test_method": "Student's t-test (independent)",
"statistic": -3.210,
"stat_symbol": "t",
"alternative": "two-sided",
"n_x": 30,
"n_y": 30,
"pvalue": 0.0022,
"stars": "**",
"alpha": 0.05,
"significant": true,
"effect_size": -0.829,
"effect_size_metric": "Cohen's d",
"effect_size_interpretation": "large",
"power": 0.884,
"H0": "μ(x) = μ(y)",
"formatted": "t = -3.210, p = 0.0022, Cohen's d = -0.829, **"
}
Installation
uv pip install "scitex-stats[all]"
Per-module extras
| Extra | Pulls in |
|---|---|
mcp |
fastmcp (MCP server for AI agents) |
plot |
matplotlib (for the optional plotting helpers) |
figrecipe |
figrecipe (publication figures + auto CSV export) |
all |
mcp + plot + figrecipe (recommended) |
dev |
pytest, pytest-cov, nbconvert, ipykernel, + every optional dep so the test suite runs |
docs |
Sphinx + RTD theme + myst-parser (docs build only) |
uv pip install "scitex-stats[mcp]" # MCP server only
uv pip install -e ".[dev]" # editable install for contributors
pip install scitex-stats[all] # pip works too, just slower
How it works
1. Describe the design, recommend the test
StatContext captures the experimental design — number of groups, sample
sizes, outcome type, paired vs between. recommend_tests ranks the
appropriate tests from that context alone, before any data is touched.
ctx = ss.StatContext(
n_groups=2, sample_sizes=[30, 30],
outcome_type="continuous", design="between", paired=False,
)
ss.recommend_tests(ctx, top_k=3)
# → ['ttest_ind', 'welch_t', 'brunner_munzel']
2. Run the test, get the unified result
run_test is the single dispatcher for all 23 tests. The same result
dict shape (statistic, pvalue, effect_size, power, formatted,
…) makes the downstream code test-agnostic.
flowchart TB
Data[Raw arrays / DataFrame] --> Ctx[StatContext]
Ctx --> Rec[recommend_tests]
Rec --> Run[run_test]
Run --> ES[effect_sizes]
Run --> Pw[power]
Run --> Res[Unified result dict]
Res --> Corr[correct: FDR/Bonferroni/Holm]
Res --> Post[posthoc: Tukey/Dunn/Nemenyi]
Res --> Fmt[format: APA/Nature/LaTeX]
Fmt --> Pub[Publication-ready string]
subgraph Surfaces ["Four surfaces — same engine"]
Py[Python API]
Cli[CLI]
Mcp[MCP server]
Sk[Skills]
end
Py -.-> Run
Cli -.-> Run
Mcp -.-> Run
Sk -.-> Run
style Pub fill:#27ae60,stroke:#2c3e50,color:#fff
style Res fill:#4a90d9,stroke:#2c3e50,color:#fff
Figure 1. Data flow and the four surfaces (Python, CLI, MCP, Skills) that share the same run_test engine. Every interface emits the unified result dict, which downstream formatters and corrections consume.
3. Effect sizes, power, corrections
Every numeric result is built from the same primitives. Use them standalone when the dispatcher's defaults aren't quite right.
from scitex_stats import effect_sizes, power, correct
effect_sizes.cohens_d(group1, group2) # → -0.829
power.sample_size_ttest(effect_size=0.5,
alpha=0.05, power=0.8) # → required n per group
correct.correct_fdr(results, alpha=0.05,
method="bh") # BH adjusted p-values
4. Linter for migration and hooks
scitex-stats ships 6 stats-specific lint rules (STX-ST001..006).
They are detected automatically by
scitex-dev's linter,
already a hard dependency — no extra install.
scitex-dev linter check-files src/ # lint a tree
scitex-dev linter list-rules --category stats # show live rule definitions
Rule reference (STX-ST001..006)
| Rule | Severity | Trigger |
|---|---|---|
STX-ST001 |
warning | scipy.stats.ttest_ind() — use ss.run_test("ttest_ind", ...) for auto effect size + CI + power |
STX-ST002 |
warning | scipy.stats.mannwhitneyu() — use ss.run_test("mannwhitneyu", ...) for auto effect size |
STX-ST003 |
warning | scipy.stats.pearsonr() — use ss.run_test("pearsonr", ...) for auto CI + power |
STX-ST004 |
warning | scipy.stats.f_oneway() — use ss.run_test("anova_oneway", ...) for post-hoc + effect sizes |
STX-ST005 |
warning | scipy.stats.wilcoxon() — use ss.run_test("wilcoxon", ...) for auto effect size |
STX-ST006 |
warning | scipy.stats.kruskal() — use ss.run_test("kruskal", ...) for post-hoc + effect sizes |
5. Etc.
Descriptive statistics, post-hoc, normality checks
from scitex_stats import describe, posthoc
describe(data) # mean, sd, median, IQR, skew, kurtosis
posthoc.posthoc_tukey([g1, g2, g3]) # pairwise Tukey HSD
ss.run_test("shapiro", data=group1) # normality check, same result dict shape
Available Tests
flowchart LR
All[23 tests] --> P[Parametric]
All --> N[Nonparametric]
All --> C[Correlation]
All --> Cat[Categorical]
All --> Norm[Normality]
P --> P1[t-test ind / paired / 1-samp]
P --> P2[ANOVA 1-way / RM / 2-way]
N --> N1[Mann-Whitney U]
N --> N2[Wilcoxon]
N --> N3[Kruskal-Wallis]
N --> N4[Friedman]
N --> N5[Brunner-Munzel]
C --> C1[Pearson]
C --> C2[Spearman]
C --> C3[Kendall]
C --> C4[Theil-Sen]
Cat --> Cat1[Chi-squared]
Cat --> Cat2[Fisher exact]
Cat --> Cat3[McNemar]
Cat --> Cat4[Cochran's Q]
Norm --> Norm1[Shapiro-Wilk]
Norm --> Norm2[Kolmogorov-Smirnov 1-samp]
Norm --> Norm3[Kolmogorov-Smirnov 2-samp]
style All fill:#4a90d9,stroke:#2c3e50,color:#fff
Figure 2. The 23 tests grouped by family. Every leaf is callable through the same run_test(name, ...) dispatcher and returns the unified result dict (Figure 1).
flowchart TB
Start([Choose a test]) --> Outcome{Outcome type?}
Outcome -->|Continuous| K{# groups?}
Outcome -->|Ordinal / ranked| K
Outcome -->|Categorical / counts| Cat{Design?}
Outcome -->|Correlation| Corr{Variable types?}
K -->|1| OneSamp{Normal?}
OneSamp -->|Yes| OS1[t-test 1-sample]
OneSamp -->|No| OS2[Wilcoxon signed-rank]
K -->|2| Two{Paired?}
Two -->|No| TwoInd{Normal + equal var?}
Two -->|Yes| TwoP{Normal diffs?}
TwoInd -->|Yes| TI1[t-test ind / Welch]
TwoInd -->|No| TI2["Brunner-Munzel <b>★ default</b>"]
TwoP -->|Yes| TP1[t-test paired]
TwoP -->|No| TP2[Wilcoxon signed-rank]
K -->|3+| Many{Design?}
Many -->|Between| MB{Normal + equal var?}
Many -->|Within| MW{Normal?}
Many -->|2-factor| M2[ANOVA 2-way]
MB -->|Yes| MB1[ANOVA 1-way]
MB -->|No| MB2[Kruskal-Wallis]
MW -->|Yes| MW1[ANOVA repeated-measures]
MW -->|No| MW2[Friedman]
Cat -->|"2×2 unpaired"| Cat1[Fisher exact]
Cat -->|"larger contingency"| Cat2[Chi-squared]
Cat -->|"2×2 paired"| Cat3[McNemar]
Cat -->|"3+ repeated binary"| Cat4["Cochran's Q"]
Corr -->|Continuous + linear| Co1[Pearson]
Corr -->|Monotonic / ranks| Co2[Spearman]
Corr -->|Small n, ties| Co3[Kendall τ]
Corr -->|With outliers| Co4[Theil-Sen]
style TI2 fill:#27ae60,stroke:#2c3e50,color:#fff
style Start fill:#4a90d9,stroke:#2c3e50,color:#fff
Figure 3. Decision flowchart for choosing a statistical test. Start from outcome type, branch by number of groups and study design. Brunner-Munzel (★) is the recommended default for two-group continuous comparisons — robust to unequal variances and non-normality.
Examples
Three runnable notebooks under examples/ — each one
executes end-to-end in CI and is the canonical reference for its
workflow.
| Notebook | Workflow |
|---|---|
01_basic_ttest.ipynb |
run_test("ttest_ind", ...) → unified result dict → APA string |
02_test_recommendation.ipynb |
StatContext → recommend_tests → top recommendation through run_test |
03_multiple_comparison.ipynb |
Family of comparisons → correct.correct_fdr (Benjamini-Hochberg) |
# Re-execute every notebook in place (refreshes outputs)
bash examples/00_run_all.sh
Four Interfaces
Python API ⭐⭐⭐
import scitex_stats as ss
from scitex_stats import effect_sizes, power, correct, posthoc
ss.run_test("ttest_ind", data=g1, data2=g2) # 23 tests, one dispatcher
ss.recommend_tests(ss.StatContext(n_groups=2, ...)) # design-driven test selection
effect_sizes.cohens_d(g1, g2) # standalone effect size
power.sample_size_ttest(effect_size=0.5,
alpha=0.05, power=0.8) # power / sample size
correct.correct_fdr(results, alpha=0.05, method="bh") # multiple-comparison correction
posthoc.posthoc_tukey([g1, g2, g3]) # post-hoc pairwise tests
CLI Commands ⭐
scitex-stats --help-recursive # Show all commands
scitex-stats list-python-apis # List Python API tree
scitex-stats list-python-apis -v # With docstrings
scitex-stats mcp list-tools # List MCP tools
scitex-stats mcp doctor # Check server health
scitex-stats mcp start # Start MCP server
MCP Server ⭐⭐
AI agents can run statistical tests and format publication-ready results autonomously.
| Tool | Description |
|---|---|
recommend_tests |
Recommend appropriate tests from a StatContext |
run_test |
Execute any of the 23 statistical tests |
format_results |
Format results in journal style (APA, Nature, LaTeX) |
power_analysis |
Compute statistical power or required sample size |
correct_pvalues |
Apply multiple-comparison correction |
describe |
Compute descriptive statistics |
effect_size |
Compute effect size between groups |
normality_test |
Test whether data follows a normal distribution |
posthoc_test |
Run post-hoc pairwise comparisons |
p_to_stars |
Convert p-value to significance stars |
scitex-stats mcp start
Skills ⭐⭐
Skills are workflow-oriented guides AI agents query to discover package capabilities and usage patterns.
scitex-stats skills list # list available skill pages
scitex-stats skills get SKILL # show a skill page
scitex-dev skills export --package scitex-stats # export to Claude Code
| Skill | Content |
|---|---|
quick-start |
Basic usage and core patterns |
test-catalog |
All 23 statistical tests with categories |
effect-sizes |
Effect size measures and interpretation |
workflows |
Common analysis patterns |
cli-reference |
CLI commands |
mcp-tools |
MCP tools for AI agents |
Also available via MCP: stats_skills_list() / stats_skills_get(name).
Part of SciTeX
scitex-stats is part of SciTeX. Install via
the umbrella with pip install scitex[stats] to use as
scitex.stats (Python) or scitex stats ... (CLI).
import scitex
@scitex.session
def main(CONFIG=scitex.INJECTED, plt=scitex.INJECTED):
data = scitex.io.load("measurements.csv")
result = scitex.stats.run_test("ttest_ind", data=data["g1"], data2=data["g2"])
scitex.io.save(result, "stats_result.csv")
fig, ax = scitex.plt.subplots()
ax.plot_box([data["g1"], data["g2"]], labels=["Control", "Treatment"])
ax.set_xyt("Group", "Value", f"p = {result['pvalue']:.4f} {result['stars']}")
scitex.io.save(fig, "comparison.png") # saves plot + CSV data
return 0
scitex.stats delegates to scitex_stats — same API, same registry.
The ecosystem modules compose:
| Module | Package | Role |
|---|---|---|
scitex.stats |
scitex-stats | Statistical testing, effect sizes, power analysis |
scitex.plt |
figrecipe | Publication-ready figures with auto CSV export |
scitex.io |
scitex-io | Universal file I/O (30+ formats) |
scitex.clew |
scitex-clew | Reproducibility verification via hash DAGs |
The SciTeX system follows the Four Freedoms for Research, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_stats-0.2.20.tar.gz.
File metadata
- Download URL: scitex_stats-0.2.20.tar.gz
- Upload date:
- Size: 6.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8bc0bf5a0d9d725cefc6ad82a7306d6fdba93942e95e7007c4c4d3bc306b80b
|
|
| MD5 |
08398e73e1c73671b79dbb13aa73a01f
|
|
| BLAKE2b-256 |
271d8e1a1a7e5da580f335fe2a4543b159f07e3d33c5fa81bc871e0d494291f6
|
Provenance
The following attestation bundles were made for scitex_stats-0.2.20.tar.gz:
Publisher:
pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-stats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_stats-0.2.20.tar.gz -
Subject digest:
d8bc0bf5a0d9d725cefc6ad82a7306d6fdba93942e95e7007c4c4d3bc306b80b - Sigstore transparency entry: 1630917482
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-stats@7d79940244b5d325ec944323dfdd54a729d12fae -
Branch / Tag:
refs/tags/v0.2.20 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish-and-github-release-on-tag.yml@7d79940244b5d325ec944323dfdd54a729d12fae -
Trigger Event:
push
-
Statement type:
File details
Details for the file scitex_stats-0.2.20-py3-none-any.whl.
File metadata
- Download URL: scitex_stats-0.2.20-py3-none-any.whl
- Upload date:
- Size: 6.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a478c6b690a663cdbe792483502d061e31c5ea44275e11419f5b44128cb7a23
|
|
| MD5 |
02c6438375d3eea8ad7f20fde8086b45
|
|
| BLAKE2b-256 |
74ae80713abb92683656308508329d5aee5b6564b10e3295a06d408799018277
|
Provenance
The following attestation bundles were made for scitex_stats-0.2.20-py3-none-any.whl:
Publisher:
pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-stats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_stats-0.2.20-py3-none-any.whl -
Subject digest:
4a478c6b690a663cdbe792483502d061e31c5ea44275e11419f5b44128cb7a23 - Sigstore transparency entry: 1630917495
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-stats@7d79940244b5d325ec944323dfdd54a729d12fae -
Branch / Tag:
refs/tags/v0.2.20 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish-and-github-release-on-tag.yml@7d79940244b5d325ec944323dfdd54a729d12fae -
Trigger Event:
push
-
Statement type: