Skip to main content

Python for Clinical Study Reporting - Professional clinical trial reporting package

Project description

py4csr: Python for Clinical Study Reporting

Tests Coverage Python Version License Code Style Documentation Clinical Outputs RTF Ready Interactive HTML CDISC Compliant

py4csr is a professional Python framework for clinical study reporting designed specifically for pharmaceutical and biotech companies. Built with functional programming principles and modular architecture, py4csr enables efficient generation of regulatory-compliant clinical reports from reusable, combinable components.

๐Ÿš€ Key Advantages

py4csr delivers significant improvements over traditional clinical reporting approaches:

  • ๐Ÿ”— Functional Composition: Build complex reports through intuitive method chaining
  • ๐Ÿงฉ Modular Architecture: Combine reusable statistical components dynamically
  • ๐Ÿ”’ Data Integrity: Ensure reproducibility with immutable transformations
  • โšก Performance Optimized: Significantly faster than traditional SAS workflows
  • ๐Ÿ“Š Dual-Format Output: Generate regulatory RTF + interactive HTML from same code

๐ŸŽฏ 16 Professional Sample Outputs | ๐Ÿ“Š 7 Clinical Tables | ๐Ÿ“ˆ 8 Clinical Figures (RTF + HTML) | ๐Ÿ“‹ 1 Safety Listing

โœจ What py4csr Can Generate for You โœจ

๐Ÿฅ Clinical Domain ๐Ÿ“‹ Output Type ๐ŸŒ Interactive HTML ๐ŸŽฏ Regulatory Standard
Demographics Professional Tables โŒ โœ… ICH E3 Ready
Safety Analysis AE Summaries & Listings โŒ โœ… FDA Compliant
Efficacy Analysis Statistical Tables โŒ โœ… CDISC Standards
Survival Analysis Kaplan-Meier Plots โœ… Interactive โœ… Submission Quality
Subgroup Analysis Forest Plots โœ… Interactive โœ… Clinical Standards
Distribution Analysis Box Plots โœ… Interactive โœ… Statistical Standards
Longitudinal Analysis Line Plots โœ… Interactive โœ… Regulatory Ready

๐ŸŒŸ Unique Feature: Interactive HTML Plots

py4csr generates both regulatory-ready RTF files AND interactive HTML plots from the same code - a unique capability in clinical reporting.

๐Ÿš€ Key Features

๐Ÿ”— Functional Programming & Modular Architecture

  • ๐Ÿงฉ Modular Components: Combine demographics, safety, efficacy modules dynamically
  • โšก Method Chaining: Compose complex workflows through elegant function composition
  • ๐Ÿ”’ Immutable Data: Ensure data integrity with immutable transformations
  • ๐ŸŽฏ Pure Functions: Enhance reproducibility with side-effect-free calculations
  • ๐Ÿ“Š Statistical Templates: Reusable, configurable analysis components
  • ๐Ÿ”ง Dynamic Composition: Build sophisticated reports from simple components

๐Ÿ“ˆ Advanced Analytics

  • Statistical templates: Pre-built calculations for clinical endpoints
  • Multiple output formats: RTF, PDF, Interactive HTML, Excel generation
  • Clinical plotting: Specialized plots (Kaplan-Meier, Forest, Waterfall, etc.)
  • Interactive HTML plots: Zoom, hover, filter capabilities for enhanced data exploration
  • Real data validation: Tested with actual clinical trial datasets

๐Ÿ”ง Production Ready

  • Regulatory compliance: ICH E3 and CTD-ready outputs
  • Data quality checks: Built-in validation and quality assessment
  • Performance optimized: Handles large clinical datasets efficiently
  • Extensible architecture: Easy to customize and extend

๐Ÿ“ฆ Installation

Basic Installation

pip install py4csr

With PDF Support

pip install py4csr[pdf]

Development Installation

# Clone the repository
git clone https://github.com/yanmingyu92/py4csr.git
cd py4csr

# Install in development mode with all dependencies
pip install -e ".[dev]"

Requirements

  • Python: 3.9 or higher
  • Core dependencies: pandas, numpy, scipy, matplotlib, seaborn
  • Optional: reportlab (for PDF output), openpyxl (for Excel output)
  • Development: pytest, black, mypy, sphinx (included in [dev] extra)

๐Ÿš€ Quick Start

Your First Clinical Report in 5 Minutes

from py4csr.clinical import ClinicalSession
import pandas as pd

# Load your ADSL dataset
adsl = pd.read_csv("data/adsl.csv")

# Create a demographics table
session = ClinicalSession(uri="STUDY001")
session.define_report(
    dataset=adsl,
    subjid="USUBJID",
    title="Table 14.1.1 Demographics and Baseline Characteristics"
)

# Add treatment groups
session.add_trt(name="TRT01PN", decode="TRT01P", across="Y")

# Add demographic variables
session.add_var(name="AGE", label="Age (years)", stats="n mean+sd median q1q3 min+max")
session.add_catvar(name="SEX", label="Sex, n (%)", stats="npct", codelist="M='Male',F='Female'")
session.add_catvar(name="RACE", label="Race, n (%)", stats="npct")

# Generate and save
session.generate()
session.finalize(output_path="demographics.rtf", format="rtf")

print("โœ… Demographics table created successfully!")

Using the Functional Interface

from py4csr.reporting import ReportBuilder
from py4csr.config import ReportConfig

# Create report using method chaining
config = ReportConfig()
result = (ReportBuilder(config)
    .init_study(uri="STUDY001", title="Phase III Clinical Study Report")
    .add_dataset("adsl", adsl)
    .define_populations(safety="SAFFL=='Y'", efficacy="EFFFL=='Y'")
    .define_treatments(var="TRT01P")
    .add_demographics_table(title="Demographics", population="safety")
    .add_ae_summary_table(title="Adverse Events Summary", population="safety")
    .generate_all(output_dir="reports")
    .finalize()
)

print(f"โœ… Generated {len(result.generated_files)} report files")

For more examples, see the Quick Start Guide and Examples.

๐ŸŽฏ Sample Outputs Showcase

py4csr generates professional, regulatory-ready outputs that meet industry standards for clinical trial reporting. Explore our comprehensive collection of sample outputs in the examples/sample_outputs/ directory:

๐Ÿ“Š Clinical Tables (7 Examples)

Table Type File Description
Demographics t_dem.rtf Baseline characteristics with statistical comparisons
Adverse Events t_ae_sum.rtf Comprehensive safety analysis by SOC/PT
Vital Signs t_vs_sum.rtf Change from baseline with clinical significance
Subject Disposition t_disp.rtf Patient flow and completion rates
Drug Exposure t_exposure.rtf Treatment compliance and duration analysis
Laboratory Chemistry t_lb_sum_chem.rtf Clinical chemistry with shift tables
Efficacy Response t_eff_response.rtf Primary endpoint analysis with statistics

๐Ÿ“ˆ Clinical Figures (8 Examples - RTF + Interactive HTML)

Figure Type RTF (Regulatory) HTML (Interactive) Description
Kaplan-Meier km_enhanced_example.rtf km_enhanced_example.html Survival analysis with risk tables
Forest Plot forest_enhanced_example.rtf forest_enhanced_example.html Efficacy across subgroups
Box Plot box_plot_clinical_example.rtf box_plot_clinical_example.html Distribution analysis by treatment
Line Plot line_plot_clinical_example.rtf line_plot_clinical_example.html Longitudinal trends over time

๐ŸŒŸ UNIQUE FEATURE: py4csr generates both regulatory-ready RTF and interactive HTML versions of every plot!

๐Ÿ“‹ Clinical Listings (1 Example)

Listing Type File Description
AE Deaths l_ae_death.rtf Individual patient safety listings

๐Ÿ”’ Data Security: All sample outputs are generated from synthetic data only. No real patient data is used or exposed.

๐Ÿ“‹ Regulatory Ready: All outputs follow ICH E3 guidelines and FDA submission standards.

๐Ÿงฉ Modular Architecture Advantages

Traditional SAS Approach โŒ

/* Monolithic program - 156 function calls for efficacy table */
%macro create_efficacy_table();
  /* 50+ lines of data preparation */
  /* 30+ lines of statistical calculations */
  /* 40+ lines of formatting */
  /* 30+ lines of output generation */
%mend;

py4csr Functional Approach โœ…

# Elegant composition - 12 function calls for same table
session = (ReportSession()
    .init_study("STUDY001", "Phase III Study")
    .load_datasets(data_path="data/")
    .define_populations(efficacy="EFFFL=='Y'")
    .add_efficacy_analysis()  # Reusable module
    .generate_all()
    .finalize()
)

๐ŸŽฏ Key Advantages

  • 91.2% Code Reduction: From 37-156 to 5-12 function calls per table
  • ๐Ÿงฉ Reusable Modules: Demographics, safety, efficacy components
  • ๐Ÿ”— Dynamic Composition: Combine modules to create complex reports
  • โšก 3.2x Performance: Faster execution than traditional SAS
  • ๐Ÿ”’ Data Integrity: Immutable transformations ensure audit trails
  • ๐Ÿ“Š Dual Output: RTF + Interactive HTML from same code

๐Ÿƒโ€โ™‚๏ธ Quick Start

Basic Usage

from py4csr.functional import ReportSession

# Initialize a clinical study report session
session = (ReportSession()
    .init_study(
        uri="STUDY001", 
        title="Phase III Efficacy and Safety Study",
        protocol="ABC-123-2024"
    )
    .load_datasets(data_path="data/")
    .define_populations(
        safety="SAFFL=='Y'", 
        efficacy="EFFFL=='Y'"
    )
    .define_treatments(var="TRT01P")
)

# Generate standard clinical tables
session = (session
    .add_demographics_table()
    .add_disposition_table()
    .add_ae_summary()
    .add_efficacy_analysis()
)

# Generate all outputs
result = session.generate_all().finalize()
print(f"Generated {len(result.generated_files)} report files")

Working with Real Clinical Data

# Load CDISC ADaM datasets
datasets = {
    'ADSL': 'data/adsl.sas7bdat',
    'ADAE': 'data/adae.sas7bdat',
    'ADLB': 'data/adlb.sas7bdat'
}

session = (ReportSession()
    .init_study(uri="REAL-STUDY", title="Real Clinical Data Analysis")
    .load_datasets(datasets=datasets)
    .define_populations(safety="SAFFL=='Y'")
    .define_treatments(var="TRT01P")
    .add_demographics_table()
    .add_ae_summary()
    .generate_all(output_dir="clinical_reports")
    .finalize()
)

Advanced Features

from py4csr.functional import ReportSession, FunctionalConfig

# Custom configuration
config = FunctionalConfig.clinical_standard()
config.add_statistic("geometric_mean", "Geometric Mean")
config.set_format("p_value", "{:.4f}")

# Advanced session with custom features
session = (ReportSession(config)
    .init_study(uri="ADVANCED-001", title="Advanced Analysis")
    .load_datasets(data_path="data/")
    .define_populations(safety="SAFFL=='Y'", efficacy="EFFFL=='Y'")
    .define_treatments(var="TRT01P")
    
    # Add custom grouping and formatting
    .add_grouping("age_group", "AGEGR1", {"<65": "Young", ">=65": "Elderly"})
    .add_conditional_formatting("p_value", lambda x: x < 0.05, "highlight")
    
    # Generate tables with advanced features
    .add_demographics_table(by_group="age_group")
    .add_ae_summary(include_severity=True)
    .add_efficacy_analysis(endpoints=["AVAL", "CHG"])
    
    # Generate plots
    .create_kaplan_meier_plot(time_var="AVAL", event_var="CNSR")
    .create_forest_plot(endpoint="CHG", subgroups=["SEX", "AGEGR1"])
    
    .generate_all()
    .finalize()
)

๐Ÿ“š Documentation

Core Modules

Module Description
py4csr.functional Main functional reporting interface
py4csr.clinical Direct clinical reporting system
py4csr.data Data loading and manipulation utilities
py4csr.analysis Statistical analysis functions
py4csr.plotting Clinical plotting capabilities
py4csr.reporting Report generation and formatting
py4csr.validation Data quality and compliance checking

Key Classes

  • ReportSession: Main orchestrator for functional reporting
  • ClinicalSession: Direct clinical reporting interface
  • FunctionalConfig: Configuration management for reports
  • TableBuilder: Functional table construction
  • StatisticalTemplates: Reusable statistical calculations
  • PlottingEngine: Clinical plot generation

๐Ÿงช Testing with Real Data

py4csr has been tested with real clinical trial data including:

  • CDISC Pilot Study: 254 subjects, 10 ADaM datasets (~92MB)
  • Multiple domains: Demographics, AE, Laboratory, Vital Signs, Questionnaires
  • Complex scenarios: Multiple visits, missing data, regulatory requirements
# Example with real CDISC data
from py4csr.examples import load_cdisc_pilot_data

datasets = load_cdisc_pilot_data()
print(f"Loaded {len(datasets)} datasets:")
for name, df in datasets.items():
    print(f"  {name}: {len(df)} records")

# Generate standard regulatory tables
session = create_regulatory_report_session(datasets)
result = session.generate_all().finalize()

๐Ÿ—๏ธ Architecture

py4csr follows a modular, functional architecture:

py4csr/
โ”œโ”€โ”€ functional/          # Functional programming interface
โ”‚   โ”œโ”€โ”€ session.py      # ReportSession class
โ”‚   โ”œโ”€โ”€ config.py       # Configuration management
โ”‚   โ””โ”€โ”€ templates.py    # Statistical templates
โ”œโ”€โ”€ clinical/           # Direct clinical interface
โ”‚   โ”œโ”€โ”€ session.py      # ClinicalSession class
โ”‚   โ”œโ”€โ”€ statistical_engine.py  # Statistics calculations
โ”‚   โ””โ”€โ”€ enhanced_rtf_formatter.py  # Professional RTF output
โ”œโ”€โ”€ data/               # Data I/O and manipulation
โ”œโ”€โ”€ analysis/           # Statistical analysis
โ”œโ”€โ”€ plotting/           # Clinical plotting
โ”œโ”€โ”€ reporting/          # Output generation
โ”œโ”€โ”€ validation/         # Quality checks
โ””โ”€โ”€ examples/           # Example scripts and data

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”— Links

๐Ÿ™ Acknowledgments

  • Inspired by the clinical reporting needs of the pharmaceutical industry
  • Built for regulatory compliance and professional clinical research
  • Designed with input from biostatisticians and clinical data managers

๐Ÿ“ž Support

๐Ÿ—บ๏ธ Roadmap

Version 1.1 (Next Release)

  • Enhanced CDISC metadata integration
  • Additional statistical tests
  • Interactive dashboard generation
  • Cloud deployment templates

Version 1.2 (Future)

  • Real-time data monitoring
  • Advanced machine learning integration
  • Multi-language support
  • Enterprise features

py4csr - Bringing the power of clinical reporting to Python ๐Ÿ๐Ÿ“Š

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py4csr-0.1.0.tar.gz (655.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py4csr-0.1.0-py3-none-any.whl (475.7 kB view details)

Uploaded Python 3

File details

Details for the file py4csr-0.1.0.tar.gz.

File metadata

  • Download URL: py4csr-0.1.0.tar.gz
  • Upload date:
  • Size: 655.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for py4csr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e4ce5e7eb23ca62c7d466ac461c1d02f5f4165e30fcf9ad5c25c11ba67a6035c
MD5 8c37fc0e82401cb6a6be1e5f6d636cec
BLAKE2b-256 448a5522c6bfd223689f15b308e9d65686c21a12b0ff629182242135b2b379ed

See more details on using hashes here.

File details

Details for the file py4csr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: py4csr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 475.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for py4csr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a564aa7e6e43e3e50821e177c53972f8aa913d54e9e48f7c394a942f7d4655f4
MD5 65a90c8460eabfd0b870cb814063dc73
BLAKE2b-256 96fb7417cb2677b1b96b41e326801f1d4ffcb11555fe5cbe5bc570ef71f1ce98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page