Skip to main content

Transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts.

Project description

clinipub

PyPI release GitHub release Tests

A Python package to transform clinical dataframes into publication-ready, beautifully styled tables for medical journals and manuscripts

clinipub is a modern Python package designed for data scientists in the life sciences and clinical research fields. It bridges the gap between raw data analysis (pandas.DataFrame) and publication-ready medical writing reporting standards (matching NEJM, JAMA, The Lancet, and CDISC/STROBE guidelines).

Unlike existing packages, clinipub offers autonomous statistical decision-making, impeccable typography, and native exports to formats that medical writers can use out of the box without manual reformatting.


Key Advantages Over Other Packages

  • Autonomous Statistics: Automatically detects variable types and evaluates normality (Shapiro-Wilk) to map the correct parametric or non-parametric test.
  • Regulatory-Grade Precision: Automatically falls back to Fisher's Exact test if any contingency cross-tab count drops below 5.

🛠️ Statistical Test Selection Matrix

The framework automatically maps and executes your bivariate comparative analytics using the following clinical logic:

Variable Type Distribution Group Count Test Executed
Continuous Normal (Parametric) 2 Groups Independent Welch's $t$-test
Continuous Normal (Parametric) 3+ Groups One-way ANOVA
Continuous Skewed (Non-Parametric) 2 Groups Mann-Whitney U test
Continuous Skewed (Non-Parametric) 3+ Groups Kruskal-Wallis test
Categorical Any 2x2 with Cell Count < 5 Fisher's Exact test
Categorical Any Standard Configurations Chi-Square ($\chi^2$) Contingency

Developer Setup & Installation

This project is managed efficiently using the uv package manager.

Prerequisites

Ensure you have uv installed:

pip install uv

Installation & Environment Setup

Clone the repository and sync the isolated project development container:

git clone https://github.com/arsalananwar11/clinipub.git

cd clinipub
uv sync

Running Unit Tests

We maintain high test coverage for clinical accuracy using pytest:

uv run pytest

Pipeline Flow

  1. MissingDataAuditor computes raw missingness and validates data completeness.
  2. ClinicalDataAuditor classifies variables and checks continuous normality.
  3. BivariateTestSelector chooses tests automatically based on group count and distribution.
  4. TableOneAssembler builds a stratified Table 1 with descriptive statistics and p-values.

Quick Usage

import pandas as pd
from clinipub import (
    MissingDataAuditor,
    ClinicalDataAuditor,
    BivariateTestSelector,
    TableOneAssembler,
)

# Load your clinical dataset
df = pd.read_csv("baseline.csv")

# 1. Audit missing data and generate an HTML report
missing_auditor = MissingDataAuditor(df)
missing_df = missing_auditor.calculate_missingness()
html_report = missing_auditor.to_html_report(
    audit_df=missing_df,
    thresholds={"low": 1.0, "mid": 20.0},
)

# 2. Audit variable types and normality
auditor = ClinicalDataAuditor(df)
var_types = auditor.detect_variable_types()
normality = auditor.test_normality(var_types["continuous"])

# 3. Run autonomous bivariate tests
selector = BivariateTestSelector(df, stratify_by="treatment")
continuous_result = selector.test_continuous("age", is_normal=normality["age"])
cat_result = selector.test_categorical("smoker_status")

# 4. Build publication-ready Table 1
assembler = TableOneAssembler(df, stratify_by="treatment")
table1 = assembler.build()

Core API and Arguments

  • MissingDataAuditor(data: pd.DataFrame)

    • calculate_missingness()
      • returns a raw DataFrame with missing_count and missing_percentage
    • to_html_report(audit_df: pd.DataFrame = None, thresholds: dict = {'low': 5.0, 'mid': 30.0})
      • returns styled HTML for publication-ready missingness reporting
  • ClinicalDataAuditor(data: pd.DataFrame)

    • detect_variable_types(max_categorical_threshold: int = 10)
      • returns {'categorical': [...], 'continuous': [...]}
    • test_normality(continuous_cols: list, alpha: float = 0.05)
      • returns {column_name: bool}
  • BivariateTestSelector(data: pd.DataFrame, stratify_by: str)

    • test_continuous(col: str, is_normal: bool)
      • returns {'p_value': float, 'test': str}
    • test_categorical(col: str)
      • returns {'p_value': float, 'test': str}
  • TableOneAssembler(data: pd.DataFrame, stratify_by: str, columns: list = None)

    • build()
      • returns a styled pandas.DataFrame with stratified descriptive statistics and p-values

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines on issues, pull requests, testing, and repository workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clinipub-0.1.0.tar.gz (72.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clinipub-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file clinipub-0.1.0.tar.gz.

File metadata

  • Download URL: clinipub-0.1.0.tar.gz
  • Upload date:
  • Size: 72.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinipub-0.1.0.tar.gz
Algorithm Hash digest
SHA256 db1e57166beea84e52c37d7a53d4d4e7712d7c15c82153b16585122e562d0bf3
MD5 a61fe92e09ee0f70428ea84492aed8de
BLAKE2b-256 fd09e3b2f3d06e8959c9bbd57c0200d0839e62c537fa259ea9d8da6aa037d1c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinipub-0.1.0.tar.gz:

Publisher: pypi-publish.yml on arsalananwar11/clinipub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clinipub-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clinipub-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for clinipub-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7d8ba37366fc24bfa7af0314a7af5efe959ac054b684b087cc4ce614d37108d
MD5 e0d93e74dc72c9c69dab5ef64e514fcf
BLAKE2b-256 4cbd44d893263c59aa4b214a026048a78abd50816723b54c3aebe1b72a1c04a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for clinipub-0.1.0-py3-none-any.whl:

Publisher: pypi-publish.yml on arsalananwar11/clinipub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page