Transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts.
Project description
clinipub
A Python package to transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts
clinipub is a modern Python package designed for data scientists in the life sciences and clinical research fields. It bridges the gap between raw data analysis (pandas.DataFrame) and publication-ready medical writing reporting standards (matching NEJM, JAMA, The Lancet, and CDISC/STROBE guidelines).
Unlike existing packages, clinipub offers autonomous statistical decision-making, impeccable typography, and native exports to formats that medical writers can use out of the box without manual reformatting.
Key Advantages Over Other Packages
- Autonomous Statistics: Automatically detects variable types and evaluates normality (Shapiro-Wilk) to map the correct parametric or non-parametric test.
- Regulatory-Grade Precision: Automatically falls back to Fisher's Exact test if any contingency cross-tab count drops below 5.
🛠️ Statistical Test Selection Matrix
The framework automatically maps and executes your bivariate comparative analytics using the following clinical logic:
| Variable Type | Distribution | Group Count | Test Executed |
|---|---|---|---|
| Continuous | Normal (Parametric) | 2 Groups | Independent Welch's $t$-test |
| Continuous | Normal (Parametric) | 3+ Groups | One-way ANOVA |
| Continuous | Skewed (Non-Parametric) | 2 Groups | Mann-Whitney U test |
| Continuous | Skewed (Non-Parametric) | 3+ Groups | Kruskal-Wallis test |
| Categorical | Any | 2x2 with Cell Count < 5 | Fisher's Exact test |
| Categorical | Any | Standard Configurations | Chi-Square ($\chi^2$) Contingency |
Developer Setup & Installation
This project is managed efficiently using the uv package manager.
Prerequisites
Ensure you have uv installed:
pip install uv
Installation & Environment Setup
Clone the repository and sync the isolated project development container:
git clone https://github.com/arsalananwar11/clinipub.git
cd clinipub
uv sync
Running Unit Tests
We maintain high test coverage for clinical accuracy using pytest:
uv run pytest
Pipeline Flow
MissingDataAuditorcomputes raw missingness and validates data completeness.ClinicalDataAuditorclassifies variables and checks continuous normality.BivariateTestSelectorchooses tests automatically based on group count and distribution.TableOneAssemblerbuilds a stratified Table 1 with descriptive statistics and p-values.
Quick Usage
import pandas as pd
from clinipub import (
MissingDataAuditor,
ClinicalDataAuditor,
BivariateTestSelector,
TableOneAssembler,
)
# Load your clinical dataset
df = pd.read_csv("baseline.csv")
# 1. Audit missing data and generate an HTML report
missing_auditor = MissingDataAuditor(df)
missing_df = missing_auditor.calculate_missingness()
html_report = missing_auditor.to_html_report(
audit_df=missing_df,
thresholds={"low": 1.0, "mid": 20.0},
)
# 2. Audit variable types and normality
auditor = ClinicalDataAuditor(df)
var_types = auditor.detect_variable_types()
normality = auditor.test_normality(var_types["continuous"])
# 3. Run autonomous bivariate tests
selector = BivariateTestSelector(df, stratify_by="treatment")
continuous_result = selector.test_continuous("age", is_normal=normality["age"])
cat_result = selector.test_categorical("smoker_status")
# 4. Build publication-ready Table 1
assembler = TableOneAssembler(df, stratify_by="treatment")
table1 = assembler.build()
Core API and Arguments
-
MissingDataAuditor(data: pd.DataFrame)calculate_missingness()- returns a raw
DataFramewithmissing_countandmissing_percentage
- returns a raw
to_html_report(audit_df: pd.DataFrame = None, thresholds: dict = {'low': 5.0, 'mid': 30.0})- returns styled HTML for publication-ready missingness reporting
-
ClinicalDataAuditor(data: pd.DataFrame)detect_variable_types(max_categorical_threshold: int = 10)- returns
{'categorical': [...], 'continuous': [...]}
- returns
test_normality(continuous_cols: list, alpha: float = 0.05)- returns
{column_name: bool}
- returns
-
BivariateTestSelector(data: pd.DataFrame, stratify_by: str)test_continuous(col: str, is_normal: bool)- returns
{'p_value': float, 'test': str}
- returns
test_categorical(col: str)- returns
{'p_value': float, 'test': str}
- returns
-
TableOneAssembler(data: pd.DataFrame, stratify_by: str, columns: list = None)build()- returns a styled
pandas.DataFramewith stratified descriptive statistics and p-values
- returns a styled
Contributing
Contributions are welcome. Please read CONTRIBUTING.md for guidelines on issues, pull requests, testing, and repository workflow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clinipub-0.1.0.tar.gz.
File metadata
- Download URL: clinipub-0.1.0.tar.gz
- Upload date:
- Size: 72.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1e57166beea84e52c37d7a53d4d4e7712d7c15c82153b16585122e562d0bf3
|
|
| MD5 |
a61fe92e09ee0f70428ea84492aed8de
|
|
| BLAKE2b-256 |
fd09e3b2f3d06e8959c9bbd57c0200d0839e62c537fa259ea9d8da6aa037d1c8
|
Provenance
The following attestation bundles were made for clinipub-0.1.0.tar.gz:
Publisher:
pypi-publish.yml on arsalananwar11/clinipub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clinipub-0.1.0.tar.gz -
Subject digest:
db1e57166beea84e52c37d7a53d4d4e7712d7c15c82153b16585122e562d0bf3 - Sigstore transparency entry: 2044512940
- Sigstore integration time:
-
Permalink:
arsalananwar11/clinipub@6dfd5071686e81e4f6a6e3038398c2e10b299203 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/arsalananwar11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@6dfd5071686e81e4f6a6e3038398c2e10b299203 -
Trigger Event:
push
-
Statement type:
File details
Details for the file clinipub-0.1.0-py3-none-any.whl.
File metadata
- Download URL: clinipub-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7d8ba37366fc24bfa7af0314a7af5efe959ac054b684b087cc4ce614d37108d
|
|
| MD5 |
e0d93e74dc72c9c69dab5ef64e514fcf
|
|
| BLAKE2b-256 |
4cbd44d893263c59aa4b214a026048a78abd50816723b54c3aebe1b72a1c04a2
|
Provenance
The following attestation bundles were made for clinipub-0.1.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on arsalananwar11/clinipub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clinipub-0.1.0-py3-none-any.whl -
Subject digest:
c7d8ba37366fc24bfa7af0314a7af5efe959ac054b684b087cc4ce614d37108d - Sigstore transparency entry: 2044512976
- Sigstore integration time:
-
Permalink:
arsalananwar11/clinipub@6dfd5071686e81e4f6a6e3038398c2e10b299203 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/arsalananwar11
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@6dfd5071686e81e4f6a6e3038398c2e10b299203 -
Trigger Event:
push
-
Statement type: