Skip to main content

Statistical reporting and table preparation framework for Python — the missing reporting layer.

Project description

PySofra

The missing statistical reporting layer for Python

Coverage Python License: GPL-3.0+ Style: ruff Types: mypy strict Tests: 886

PySofra turns datasets, fitted models, and summary statistics into publication-ready tables — across HTML · Markdown · LaTeX · DOCX · PPTX · XLSX · PNG — from a single immutable SofraTable object. It brings the practical workflows of R's tableone, gtsummary, and flextable into a single coherent Pythonic API.

Baseline characteristics table by treatment arm — JAMA theme, with p-values, standardized mean differences, and Overall column
Baseline characteristics, by treatment arm. One line of code.
Adjusted odds ratios with inline forest plot
Adjusted ORs + inline forest plot
tbl_regression(fit).with_forest_plot()
Kaplan-Meier survival table with embedded KM curve
KM table + inline survival curve
tbl_survival(...).with_km_plot()

Why PySofra

  • One immutable object, seven output formats — build a SofraTable once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
  • Auto-dispatched statistical tests — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted t — picked by variable kind, overridable per-row
  • Inline forest plots and KM curves — embed matplotlib figures directly into the table; the same SofraTable renders them across every backend
  • Statistically correct — every numeric output validated against scipy / statsmodels / lifelines at machine precision, with cross-checks against R's gtsummary
  • Method-chainable and immutable — every modifier returns a new table; no in-place mutation, no global state, fully reproducible

Showcase notebook47 cells, every section a side-by-side numeric proof. Start here if you have 60 seconds.

End-to-end tutorial126 cells walking every public feature on a synthetic two-arm trial.


Quick start

import numpy as np
import pandas as pd
import pysofra as ps

# Toy two-arm trial; replace with your own DataFrame in real use.
rng = np.random.default_rng(0)
df = pd.DataFrame({
    "arm":   rng.choice(["Placebo", "Treatment"], 200),
    "age":   rng.normal(60, 10, 200).round(),
    "bmi":   rng.normal(28, 5, 200).round(1),
    "event": rng.binomial(1, 0.3, 200),
})

# Table 1 — baseline characteristics by treatment arm
tbl = (
    ps.tbl_one(df, by="arm")
      .add_p()
      .add_smd()
      .add_overall()
      .theme("clinical")
)

tbl                          # renders in Jupyter / Colab / VS Code
tbl.to_docx("table1.docx")   # publication-quality Word
tbl.to_html()                # standalone HTML fragment
tbl.to_markdown()            # GitHub-flavored Markdown

The same workflow handles regression tables:

import statsmodels.api as sm

X = sm.add_constant(df[["age", "bmi"]])
fit = sm.Logit(df["event"], X).fit(disp=False)

(
    ps.tbl_regression(fit, exponentiate=True)
      .bold_p()
      .theme("jama")
      .to_docx("table2.docx")
)

The end-to-end worked example — baseline table by treatment arm, regression table with forest plot, and Kaplan-Meier survival summary — is in the showcase notebook.


What's in the box

Feature Function / object Status
Baseline Table 1 ps.tbl_one MVP
Descriptive summary ps.tbl_summary MVP
Regression results ps.tbl_regression MVP
Side-by-side merge ps.tbl_merge MVP
Vertical stack ps.tbl_stack MVP
HTML / Markdown .to_html / .to_markdown MVP
DOCX export .to_docx MVP
LaTeX export .to_latex MVP
PPTX export .to_pptx MVP (extras)
Excel export .to_xlsx MVP
Inline forest plots tbl_regression(...).with_forest_plot() MVP
Inline KM curves tbl_survival(...).with_km_plot() MVP
Cross-backend plot embedding DOCX/PPTX/LaTeX include the plot too MVP
Rao–Scott chi-square weighted Table 1 auto-route MVP
SurveyDesign (strata + cluster + FPC) Taylor-linearised variance MVP
Themes clinical, jama, nejm, compact, minimal MVP
Auto test selection t-test / ANOVA / Wilcoxon / Kruskal / χ² / Fisher MVP
Per-variable test overrides tests={'age': 'wilcoxon', ...} MVP
Multiplicity adjustment .add_q() — BH, BY, Bonferroni, Holm, Hommel, Šidák MVP
Multi-model regression tbl_regression([m1, m2], model_labels=[...]) MVP
lifelines (Cox / AFT) tbl_regression(cph) MVP
sklearn (linear models) tbl_regression(clf) — point estimates only MVP
Kaplan–Meier summary tbl_survival(df, time=, event=, by=, times=[...]) MVP
Survey-weighted Table 1 tbl_one(..., weights='w') MVP
polars input tbl_one(pl.DataFrame(...)) MVP
Conditional formatting .bold_if, .highlight_if, .style_if MVP
Sticky-header notebook tables .to_html(sticky_header=True) MVP
Standardised mean differences continuous + categorical (Yang–Dalton) MVP
Notebook rendering _repr_html_ / _repr_markdown_ / _repr_latex_ MVP

Design principles

  • Backend-agnostic tables. A SofraTable is the single source of truth; every renderer (HTML, Markdown, DOCX, …) reads the same object.
  • Immutable method chaining. Every modifier returns a new SofraTable. No surprises, no global state.
  • Strong defaults, explicit overrides. Sensible journal-style output out of the box; per-variable type, label, and test overrides when you need them.
  • Deterministic. The same input always produces the same output — critical for reproducible research.
  • No magic. No nonstandard evaluation, no metaprogramming, no network calls, no telemetry.

Installation

pip install pysofra

PySofra requires Python ≥ 3.11. The core install only pulls numpy, pandas, scipy, statsmodels, and python-docx. Domain extras unlock the features that depend on heavier optional libraries:

pip install "pysofra[survival]"   # tbl_survival + KM curves (lifelines, matplotlib)
pip install "pysofra[plot]"       # forest plots, table-as-image (matplotlib)
pip install "pysofra[pptx]"       # PowerPoint export (python-pptx)
pip install "pysofra[xlsx]"       # Excel export (xlsxwriter)
pip install "pysofra[polars]"     # accept polars DataFrames as input
pip install "pysofra[sklearn]"    # tbl_regression on scikit-learn models
pip install "pysofra[all]"        # everything above
pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypothesis)

Status

PySofra is in alpha (0.1.0a3). The public API surface is pinned by an explicit API-stability test so that any unintended rename, removal, or signature change surfaces as a failed test. Quality bar at this release:

  • More than 800 tests passing, 100% line coverage, mypy strict, ruff clean.
  • Every numeric output is validated against scipy, lifelines, statsmodels, or a hand-computed textbook formula (test_joss_statistical_correctness.py).
  • Universal invariants enforced via Hypothesis on 720 randomized examples per CI run (test_joss_property_invariants.py).
  • Renderer output is byte-deterministic — identical input always produces identical HTML/Markdown/LaTeX, required for reproducible publication artifacts (test_joss_renderer_consistency.py).

Bug reports and use-case feedback are very welcome.

Contributing

Bug reports, feature requests, and pull requests are all very welcome. Please read CONTRIBUTING.md for the workflow, the quality gates, and the Code of Conduct.

License

GPL-3.0-or-later. See LICENSE.

Citation

If you use PySofra in academic work, please cite the project — see CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysofra-0.1.0a3.tar.gz (221.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysofra-0.1.0a3-py3-none-any.whl (143.7 kB view details)

Uploaded Python 3

File details

Details for the file pysofra-0.1.0a3.tar.gz.

File metadata

  • Download URL: pysofra-0.1.0a3.tar.gz
  • Upload date:
  • Size: 221.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pysofra-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 5e5ef8408983290c3ed617be621d0dfe1afd122288e6131c72b0e2b38895d580
MD5 0e81c5712e46380a6425c8175ec94246
BLAKE2b-256 04d1920bfb56d530896a2c5f9e858f2c5909bfe206abb6738d53296d151329e2

See more details on using hashes here.

File details

Details for the file pysofra-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: pysofra-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 143.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pysofra-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 a71d846f95972c5a3a0af5680e9e5e8a775caaa1d082958f987b57e77ab67ccc
MD5 a21fb00154ceb358a9ff0e7630150ec8
BLAKE2b-256 61b0de921f4814ff7b0d1869013667ac3aaf490ea3b88ccdc6924db5e5a7e6b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page