Skip to main content

Statistical reporting and table preparation framework for Python — the missing reporting layer.

Project description

PySofra

The missing statistical reporting layer for Python

Coverage Python License: GPL-3.0+ Style: ruff Types: mypy strict Tests: 886

PySofra turns datasets, fitted models, and summary statistics into publication-ready tables — across HTML · Markdown · LaTeX · DOCX · PPTX · XLSX · PNG — from a single immutable SofraTable object. It brings the practical workflows of R's tableone, gtsummary, and flextable into a single coherent Pythonic API.

Baseline characteristics table by treatment arm — JAMA theme, with p-values, standardized mean differences, and Overall column
Baseline characteristics, by treatment arm. One line of code.
Adjusted odds ratios with inline forest plot
Adjusted ORs + inline forest plot
tbl_regression(fit).with_forest_plot()
Kaplan-Meier survival table with embedded KM curve
KM table + inline survival curve
tbl_survival(...).with_km_plot()

Why PySofra

  • One immutable object, seven output formats — build a SofraTable once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
  • Auto-dispatched statistical tests — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted t — picked by variable kind, overridable per-row
  • Inline forest plots and KM curves — embed matplotlib figures directly into the table; the same SofraTable renders them across every backend
  • Statistically correct — every numeric output validated against scipy / statsmodels / lifelines at machine precision (and cross-checked against R's gtsummary for the JSS paper)
  • Method-chainable and immutable — every modifier returns a new table; no in-place mutation, no global state, fully reproducible

Showcase notebook47 cells, every section a side-by-side numeric proof. Start here if you have 60 seconds.

End-to-end tutorial126 cells walking every public feature on a synthetic two-arm trial.


Quick start

import numpy as np
import pandas as pd
import pysofra as ps

# Toy two-arm trial; replace with your own DataFrame in real use.
rng = np.random.default_rng(0)
df = pd.DataFrame({
    "arm":   rng.choice(["Placebo", "Treatment"], 200),
    "age":   rng.normal(60, 10, 200).round(),
    "bmi":   rng.normal(28, 5, 200).round(1),
    "event": rng.binomial(1, 0.3, 200),
})

# Table 1 — baseline characteristics by treatment arm
tbl = (
    ps.tbl_one(df, by="arm")
      .add_p()
      .add_smd()
      .add_overall()
      .theme("clinical")
)

tbl                          # renders in Jupyter / Colab / VS Code
tbl.to_docx("table1.docx")   # publication-quality Word
tbl.to_html()                # standalone HTML fragment
tbl.to_markdown()            # GitHub-flavored Markdown

The same workflow handles regression tables:

import statsmodels.api as sm

X = sm.add_constant(df[["age", "bmi"]])
fit = sm.Logit(df["event"], X).fit(disp=False)

(
    ps.tbl_regression(fit, exponentiate=True)
      .bold_p()
      .theme("jama")
      .to_docx("table2.docx")
)

The end-to-end worked example — baseline table by treatment arm, regression table with forest plot, and Kaplan-Meier survival summary — is in the showcase notebook.


What's in the box

Feature Function / object Status
Baseline Table 1 ps.tbl_one MVP
Descriptive summary ps.tbl_summary MVP
Regression results ps.tbl_regression MVP
Side-by-side merge ps.tbl_merge MVP
Vertical stack ps.tbl_stack MVP
HTML / Markdown .to_html / .to_markdown MVP
DOCX export .to_docx MVP
LaTeX export .to_latex MVP
PPTX export .to_pptx MVP (extras)
Excel export .to_xlsx MVP
Inline forest plots tbl_regression(...).with_forest_plot() MVP
Inline KM curves tbl_survival(...).with_km_plot() MVP
Cross-backend plot embedding DOCX/PPTX/LaTeX include the plot too MVP
Rao–Scott chi-square weighted Table 1 auto-route MVP
SurveyDesign (strata + cluster + FPC) Taylor-linearised variance MVP
Themes clinical, jama, nejm, compact, minimal MVP
Auto test selection t-test / ANOVA / Wilcoxon / Kruskal / χ² / Fisher MVP
Per-variable test overrides tests={'age': 'wilcoxon', ...} MVP
Multiplicity adjustment .add_q() — BH, BY, Bonferroni, Holm, Hommel, Šidák MVP
Multi-model regression tbl_regression([m1, m2], model_labels=[...]) MVP
lifelines (Cox / AFT) tbl_regression(cph) MVP
sklearn (linear models) tbl_regression(clf) — point estimates only MVP
Kaplan–Meier summary tbl_survival(df, time=, event=, by=, times=[...]) MVP
Survey-weighted Table 1 tbl_one(..., weights='w') MVP
polars input tbl_one(pl.DataFrame(...)) MVP
Conditional formatting .bold_if, .highlight_if, .style_if MVP
Sticky-header notebook tables .to_html(sticky_header=True) MVP
Standardised mean differences continuous + categorical (Yang–Dalton) MVP
Notebook rendering _repr_html_ / _repr_markdown_ / _repr_latex_ MVP

Design principles

  • Backend-agnostic tables. A SofraTable is the single source of truth; every renderer (HTML, Markdown, DOCX, …) reads the same object.
  • Immutable method chaining. Every modifier returns a new SofraTable. No surprises, no global state.
  • Strong defaults, explicit overrides. Sensible journal-style output out of the box; per-variable type, label, and test overrides when you need them.
  • Deterministic. The same input always produces the same output — critical for reproducible research.
  • No magic. No nonstandard evaluation, no metaprogramming, no network calls, no telemetry.

Installation

pip install pysofra

PySofra requires Python ≥ 3.11. The core install only pulls numpy, pandas, scipy, statsmodels, and python-docx. Domain extras unlock the features that depend on heavier optional libraries:

pip install "pysofra[survival]"   # tbl_survival + KM curves (lifelines, matplotlib)
pip install "pysofra[plot]"       # forest plots, table-as-image (matplotlib)
pip install "pysofra[pptx]"       # PowerPoint export (python-pptx)
pip install "pysofra[xlsx]"       # Excel export (xlsxwriter)
pip install "pysofra[polars]"     # accept polars DataFrames as input
pip install "pysofra[sklearn]"    # tbl_regression on scikit-learn models
pip install "pysofra[all]"        # everything above
pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypothesis)

Status

PySofra is in alpha (0.1.0a2). The public API surface is pinned by an explicit API-stability test so that any unintended rename, removal, or signature change surfaces as a failed test. Quality bar at this release:

  • More than 800 tests passing, 100% line coverage, mypy strict, ruff clean.
  • Every numeric output is validated against scipy, lifelines, statsmodels, or a hand-computed textbook formula (test_joss_statistical_correctness.py).
  • Universal invariants enforced via Hypothesis on 720 randomized examples per CI run (test_joss_property_invariants.py).
  • Renderer output is byte-deterministic — identical input always produces identical HTML/Markdown/LaTeX, required for reproducible publication artifacts (test_joss_renderer_consistency.py).

Bug reports and use-case feedback are very welcome.

Contributing

Bug reports, feature requests, and pull requests are all very welcome. Please read CONTRIBUTING.md for the workflow, the quality gates, and the Code of Conduct.

License

GPL-3.0-or-later. See LICENSE.

Citation

If you use PySofra in academic work, please cite the project — see CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysofra-0.1.0a2.tar.gz (222.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysofra-0.1.0a2-py3-none-any.whl (143.7 kB view details)

Uploaded Python 3

File details

Details for the file pysofra-0.1.0a2.tar.gz.

File metadata

  • Download URL: pysofra-0.1.0a2.tar.gz
  • Upload date:
  • Size: 222.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pysofra-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 da8e7acd8cbb4894efe1f15edd98c0abd0f1817f9083a5b692b61d334721eb4c
MD5 ab182b63fe3417105fb4464a18be7872
BLAKE2b-256 7cdddf6458160e8992f22ec1877dc696de04aefbe3de31bb7a29b5ca6a446560

See more details on using hashes here.

File details

Details for the file pysofra-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: pysofra-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 143.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pysofra-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 ad3d200ac5a7398f1bdff8c83d6c9e1fa67a943b6344a9a7940e603f5102908c
MD5 b92062ff709ed497ad5be3a927ee47af
BLAKE2b-256 6d43dd5de6785eb1a78e812b29d2feffa67e46f7050bb61f8c5f4d59190ea97d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page