Statistical reporting and table preparation framework for Python — the missing reporting layer.
Project description
PySofra turns datasets, fitted models, and summary statistics into publication-ready tables — across HTML · Markdown · LaTeX · DOCX · PPTX · XLSX · PNG — from a single immutable
SofraTableobject. It brings the practical workflows of R'stableone,gtsummary, andflextableinto a single coherent Pythonic API.
Baseline characteristics, by treatment arm. One line of code.
|
Adjusted ORs + inline forest plot tbl_regression(fit).with_forest_plot()
|
KM table + inline survival curve tbl_survival(...).with_km_plot()
|
Why PySofra
- One immutable object, seven output formats — build a
SofraTableonce, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes - Auto-dispatched statistical tests — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted t — picked by variable kind, overridable per-row
- Inline forest plots and KM curves — embed matplotlib figures directly into the table; the same
SofraTablerenders them across every backend - Statistically correct — every numeric output validated against
scipy/statsmodels/lifelinesat machine precision, with cross-checks against R'sgtsummary - Method-chainable and immutable — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
Showcase notebook — 47 cells, every section a side-by-side numeric proof. Start here if you have 60 seconds.
End-to-end tutorial — 126 cells walking every public feature on a synthetic two-arm trial.
Quick start
import numpy as np
import pandas as pd
import pysofra as ps
# Toy two-arm trial; replace with your own DataFrame in real use.
rng = np.random.default_rng(0)
df = pd.DataFrame({
"arm": rng.choice(["Placebo", "Treatment"], 200),
"age": rng.normal(60, 10, 200).round(),
"bmi": rng.normal(28, 5, 200).round(1),
"event": rng.binomial(1, 0.3, 200),
})
# Table 1 — baseline characteristics by treatment arm
tbl = (
ps.tbl_one(df, by="arm")
.add_p()
.add_smd()
.add_overall()
.theme("clinical")
)
tbl # renders in Jupyter / Colab / VS Code
tbl.to_docx("table1.docx") # publication-quality Word
tbl.to_html() # standalone HTML fragment
tbl.to_markdown() # GitHub-flavored Markdown
The same workflow handles regression tables:
import statsmodels.api as sm
X = sm.add_constant(df[["age", "bmi"]])
fit = sm.Logit(df["event"], X).fit(disp=False)
(
ps.tbl_regression(fit, exponentiate=True)
.bold_p()
.theme("jama")
.to_docx("table2.docx")
)
The end-to-end worked example — baseline table by treatment arm, regression table with forest plot, and Kaplan-Meier survival summary — is in the showcase notebook.
What's in the box
| Feature | Function / object | Status |
|---|---|---|
| Baseline Table 1 | ps.tbl_one |
MVP |
| Descriptive summary | ps.tbl_summary |
MVP |
| Regression results | ps.tbl_regression |
MVP |
| Side-by-side merge | ps.tbl_merge |
MVP |
| Vertical stack | ps.tbl_stack |
MVP |
| HTML / Markdown | .to_html / .to_markdown |
MVP |
| DOCX export | .to_docx |
MVP |
| LaTeX export | .to_latex |
MVP |
| PPTX export | .to_pptx |
MVP (extras) |
| Excel export | .to_xlsx |
MVP |
| Inline forest plots | tbl_regression(...).with_forest_plot() |
MVP |
| Inline KM curves | tbl_survival(...).with_km_plot() |
MVP |
| Cross-backend plot embedding | DOCX/PPTX/LaTeX include the plot too | MVP |
| Rao–Scott chi-square | weighted Table 1 auto-route | MVP |
SurveyDesign (strata + cluster + FPC) |
Taylor-linearised variance | MVP |
| Themes | clinical, jama, nejm, compact, minimal |
MVP |
| Auto test selection | t-test / ANOVA / Wilcoxon / Kruskal / χ² / Fisher | MVP |
| Per-variable test overrides | tests={'age': 'wilcoxon', ...} |
MVP |
| Multiplicity adjustment | .add_q() — BH, BY, Bonferroni, Holm, Hommel, Šidák |
MVP |
| Multi-model regression | tbl_regression([m1, m2], model_labels=[...]) |
MVP |
| lifelines (Cox / AFT) | tbl_regression(cph) |
MVP |
| sklearn (linear models) | tbl_regression(clf) — point estimates only |
MVP |
| Kaplan–Meier summary | tbl_survival(df, time=, event=, by=, times=[...]) |
MVP |
| Survey-weighted Table 1 | tbl_one(..., weights='w') |
MVP |
| polars input | tbl_one(pl.DataFrame(...)) |
MVP |
| Conditional formatting | .bold_if, .highlight_if, .style_if |
MVP |
| Sticky-header notebook tables | .to_html(sticky_header=True) |
MVP |
| Standardised mean differences | continuous + categorical (Yang–Dalton) | MVP |
| Notebook rendering | _repr_html_ / _repr_markdown_ / _repr_latex_ |
MVP |
Design principles
- Backend-agnostic tables. A
SofraTableis the single source of truth; every renderer (HTML, Markdown, DOCX, …) reads the same object. - Immutable method chaining. Every modifier returns a new
SofraTable. No surprises, no global state. - Strong defaults, explicit overrides. Sensible journal-style output out of the box; per-variable type, label, and test overrides when you need them.
- Deterministic. The same input always produces the same output — critical for reproducible research.
- No magic. No nonstandard evaluation, no metaprogramming, no network calls, no telemetry.
Installation
pip install pysofra
PySofra requires Python ≥ 3.11. The core install only pulls numpy,
pandas, scipy, statsmodels, and python-docx. Domain extras unlock
the features that depend on heavier optional libraries:
pip install "pysofra[survival]" # tbl_survival + KM curves (lifelines, matplotlib)
pip install "pysofra[plot]" # forest plots, table-as-image (matplotlib)
pip install "pysofra[pptx]" # PowerPoint export (python-pptx)
pip install "pysofra[xlsx]" # Excel export (xlsxwriter)
pip install "pysofra[polars]" # accept polars DataFrames as input
pip install "pysofra[sklearn]" # tbl_regression on scikit-learn models
pip install "pysofra[all]" # everything above
pip install "pysofra[dev]" # testing + linting (pytest, ruff, mypy, hypothesis)
Status
PySofra is in alpha (0.1.0a3). The public API surface is pinned
by an explicit
API-stability test
so that any unintended rename, removal, or signature change surfaces as
a failed test. Quality bar at this release:
- More than 800 tests passing, 100% line coverage, mypy strict, ruff clean.
- Every numeric output is validated against
scipy,lifelines,statsmodels, or a hand-computed textbook formula (test_joss_statistical_correctness.py). - Universal invariants enforced via Hypothesis on 720 randomized examples per CI run (test_joss_property_invariants.py).
- Renderer output is byte-deterministic — identical input always produces identical HTML/Markdown/LaTeX, required for reproducible publication artifacts (test_joss_renderer_consistency.py).
Bug reports and use-case feedback are very welcome.
Contributing
Bug reports, feature requests, and pull requests are all very welcome.
Please read
CONTRIBUTING.md
for the workflow, the quality gates, and the
Code of Conduct.
License
GPL-3.0-or-later. See
LICENSE.
Citation
If you use PySofra in academic work, please cite the project — see
CITATION.cff.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysofra-0.1.0a3.tar.gz.
File metadata
- Download URL: pysofra-0.1.0a3.tar.gz
- Upload date:
- Size: 221.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e5ef8408983290c3ed617be621d0dfe1afd122288e6131c72b0e2b38895d580
|
|
| MD5 |
0e81c5712e46380a6425c8175ec94246
|
|
| BLAKE2b-256 |
04d1920bfb56d530896a2c5f9e858f2c5909bfe206abb6738d53296d151329e2
|
File details
Details for the file pysofra-0.1.0a3-py3-none-any.whl.
File metadata
- Download URL: pysofra-0.1.0a3-py3-none-any.whl
- Upload date:
- Size: 143.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a71d846f95972c5a3a0af5680e9e5e8a775caaa1d082958f987b57e77ab67ccc
|
|
| MD5 |
a21fb00154ceb358a9ff0e7630150ec8
|
|
| BLAKE2b-256 |
61b0de921f4814ff7b0d1869013667ac3aaf490ea3b88ccdc6924db5e5a7e6b0
|