Skip to main content

Transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts.

Project description

clinipub

PyPI release GitHub release Tests

A Python package to transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts

clinipub is a modern Python package designed for data scientists in the life sciences and clinical research fields. It bridges the gap between raw data analysis (pandas.DataFrame) and publication-ready medical writing reporting standards (matching NEJM, JAMA, The Lancet, and CDISC/STROBE guidelines).

Unlike existing packages, clinipub offers autonomous statistical decision-making, impeccable typography, and native exports to formats that medical writers can use out of the box without manual reformatting.


Key Advantages Over Other Packages

  • Autonomous Statistics: Automatically detects variable types and evaluates normality (Shapiro-Wilk) to map the correct parametric or non-parametric test.
  • Regulatory-Grade Precision: Automatically falls back to Fisher's Exact test if any contingency cross-tab count drops below 5.

Statistical Test Selection Matrix

The framework automatically maps and executes your bivariate comparative analytics using the following clinical logic:

Variable Type Distribution Group Count Test Executed
Continuous Normal (Parametric) 2 Groups Independent Welch's $t$-test
Continuous Normal (Parametric) 3+ Groups One-way ANOVA
Continuous Skewed (Non-Parametric) 2 Groups Mann-Whitney U test
Continuous Skewed (Non-Parametric) 3+ Groups Kruskal-Wallis test
Categorical Any 2x2 with Cell Count < 5 Fisher's Exact test
Categorical Any Standard Configurations Chi-Square ($\chi^2$) Contingency

Developer Setup & Installation

This project is managed efficiently using the uv package manager.

Prerequisites

Ensure you have uv installed:

pip install uv

Installation & Environment Setup

Clone the repository and sync the isolated project development container:

git clone https://github.com/arsalananwar11/clinipub.git

cd clinipub
uv sync

Running Unit Tests

We maintain high test coverage for clinical accuracy using pytest:

uv run pytest

Pipeline Flow

  1. MissingDataAuditor computes raw missingness and validates data completeness.
  2. ClinicalDataAuditor classifies variables and checks continuous normality.
  3. BivariateTestSelector chooses tests automatically based on group count and distribution.
  4. TableOneAssembler builds a stratified Table 1 with descriptive statistics and p-values.

Quick Usage

import pandas as pd
from clinipub import (
    MissingDataAuditor,
    ClinicalDataAuditor,
    BivariateTestSelector,
    TableOneAssembler,
)

# Load your clinical dataset
df = pd.read_csv("baseline.csv")

# 1. Audit missing data and generate an HTML report
missing_auditor = MissingDataAuditor(df)
missing_df = missing_auditor.calculate_missingness()
html_report = missing_auditor.to_html_report(
    audit_df=missing_df,
    thresholds={"low": 1.0, "mid": 20.0},
)

# 2. Audit variable types and normality
auditor = ClinicalDataAuditor(df)
var_types = auditor.detect_variable_types()
normality = auditor.test_normality(var_types["continuous"])

# 3. Run autonomous bivariate tests
selector = BivariateTestSelector(df, stratify_by="treatment")
continuous_result = selector.test_continuous("age", is_normal=normality["age"])
cat_result = selector.test_categorical("smoker_status")

# 4. Build publication-ready Table 1
assembler = TableOneAssembler(df, stratify_by="treatment")
table1 = assembler.build()

Core API and Arguments

  • MissingDataAuditor(data: pd.DataFrame)

    • calculate_missingness()
      • returns a raw DataFrame with missing_count and missing_percentage
    • to_html_report(audit_df: pd.DataFrame = None, thresholds: dict = {'low': 5.0, 'mid': 30.0})
      • returns styled HTML for publication-ready missingness reporting
  • ClinicalDataAuditor(data: pd.DataFrame)

    • detect_variable_types(max_categorical_threshold: int = 10)
      • returns {'categorical': [...], 'continuous': [...]}
    • test_normality(continuous_cols: list, alpha: float = 0.05)
      • returns {column_name: bool}
  • BivariateTestSelector(data: pd.DataFrame, stratify_by: str)

    • test_continuous(col: str, is_normal: bool)
      • returns {'p_value': float, 'test': str}
    • test_categorical(col: str)
      • returns {'p_value': float, 'test': str}
  • TableOneAssembler(data: pd.DataFrame, stratify_by: str, columns: list = None)

    • build()
      • returns a styled pandas.DataFrame with stratified descriptive statistics and p-values

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines on issues, pull requests, testing, and repository workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clinipub-0.1.2.tar.gz (73.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clinipub-0.1.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file clinipub-0.1.2.tar.gz.

File metadata

  • Download URL: clinipub-0.1.2.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinipub-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7cd7bc61de6c59f56ea35ec05381a4d160fdf4896269ce5f71b41117eadfb57c
MD5 befb993444cc684cd4ba912a4bacc6ca
BLAKE2b-256 17320311f3620daf8844878872aa636de4631b7633f8e35612793c34001b441b

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinipub-0.1.2.tar.gz:

Publisher: pypi-publish.yml on arsalananwar11/clinipub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clinipub-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: clinipub-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinipub-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e4ca299db2269d67ae65059940943ae181775640184cdb4edac74ed7f93ea113
MD5 d8eeaa5124535b4c50fa764232564f78
BLAKE2b-256 e3bae223f2c1f242a2cc03e158d9d2c79de03dff29075c6dc617ba4064cb8fb4

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinipub-0.1.2-py3-none-any.whl:

Publisher: pypi-publish.yml on arsalananwar11/clinipub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page