Skip to main content

ClinicalPLAN is a Python package for predicting postoperative risks from clinical notes using language models. It provides training and inference workflows for fine-tuned models, semi-supervised methods, and multi-task prediction of multiple clinical outcomes. The package is intended for clinical research and educational use, notably for the American College of Surgeons.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Overview

ClinicalPLAN (Clinical Postoperative Risk Prediction with Language Models Adapting to Clinical Notes) is a Python package for predicting postoperative risks from clinical notes using language models. It provides flexible and clinically oriented workflows that support a range of perioperative use cases, enabling clinicians, researchers, and healthcare institutions to train and fine-tune models using preoperative or intraoperative clinical text.

The package is designed to be accessible to a broad range of users, including clinicians, surgeons, and researchers with limited programming experience. It minimizes the need to interact with lower-level machine learning frameworks such as PyTorch. With just a few lines of high-level functions, users can begin training and fine-tuning their own models.

ClinicalPLAN supports multiple modeling strategies, including:

  1. Direct inference with fine-tuned language models
  2. (Joint) Semi-supervised learning approaches for leveraging partially labeled data
  3. A multi-task learning framework that enables simultaneous prediction of multiple postoperative outcomes

The package was developed for the American College of Surgeons (ACS) workshop, AI for Clinicians and Surgeons: A Hands-On Introduction Across the Care Continuum.

The accompanying work is:
The foundational capabilities of large language models in predicting postoperative risks using clinical notes
Alba, Xue, Abraham, Kannampallil, and Lu (2025), npj Digital Medicine


Installation

pip install clinicalplan

Because torch CUDA wheels aren't hosted on PyPI, install PyTorch first matching your GPU's CUDA version, then install this package. For example, on a machine with CUDA 11.8 drivers:

pip install torch==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install clinicalplan

Python version: 3.9–3.12 (tested on 3.12).


Quick example

import pandas as pd
from MultiTaskLearningPrediction import mtl_finetune, get_postoperative_outcome_scores

df = pd.read_csv("my_clinical_data.csv")
# df columns: "text", "Outcome_1", "Outcome_2", "Outcome_3", "Outcome_4"

# 1. Fine-tune
mtl_finetune(
    df,
    text_col="text",
    outcome_cols=["Outcome_1", "Outcome_2", "Outcome_3", "Outcome_4"],
    output_dir="my_finetuned_model",
)




# 2. Score a new scenario

note_1 = (
    "83-year-old male, ASA 4, scheduled for coronary artery bypass graft (emergent three-vessel). "
    "Indication: severe CAD with LAD stenosis, presenting with unstable angina. "
    "PMH: COPD, type 2 diabetes mellitus, coronary artery disease, prior MI, chronic kidney disease stage 3. "
    "Social: current smoker, 1 pack per day. "
    "BMI 34 (obese). "
    "Home medications: metoprolol, aspirin 81 mg, atorvastatin, insulin glargine, furosemide. "
    "Allergies: NKDA. "
    "Preop labs within acceptable limits. Consent obtained, plan to proceed."
)


scores = get_postoperative_outcome_scores(
    "my_finetuned_model",
    note_1
)
# {'Outcome_1': 0.12, 'Outcome_2': 0.28, 'Outcome_3': 0.04, 'Outcome_4': 0.39}

API reference

Joint or semi-supervised finetuning

Joint Single-Outcome Finetuning trains a separate model for each postoperative outcome of interest. The jointly learns the structure of your clinical notes whilst learns to predict the outcome, ensuring the model captures both the linguistic patterns of your institution's documentation style and the clinical features that drive your specific outcomes. Unlike the below MultiTaskLearningPrediction, this is catered to a single specific outcome as opposed to multiple outcomes.

description of joint JointFinetuning

JointFinetuning

Perform Joint (or semi-supervised) finetuning.

Example

joint_finetune(
    df,
    text_col="clinical_notes",
    outcome_col="DVT",
    output_dir="DVT_model",
    training_configs={
        "num_train_epochs": 3,
        "per_device_train_batch_size": 16,
        "evaluation_strategy": "steps",
        "eval_steps": 100,
        "logging_steps": 100,
        "learning_rate": 2e-5,
    },
)

Fine-tune Bio+ClinicalBERT on MLM jointly with a single binary classification head for one outcome.

Parameters

  • df (pandas.DataFrame, required): Must contain text_col and outcome_col.
  • text_col (str, required): Name of the free-text column.
  • outcome_col (str, required): Name of a single binary (0/1) outcome column. Rows with NaN in this column are dropped before training.
  • output_dir (str, default "joint_finetuned"): Directory to save the fine-tuned model, tokenizer, and metadata. Also used as the HuggingFace Trainer output_dir for checkpoints and logs.
  • base_model (str, default "emilyalsentzer/Bio_ClinicalBERT"): HuggingFace model id to start from. Any BERT-architecture model should work.
  • hf_token (str | None, default None): Optional HuggingFace token for gated/private base models. If None, uses the cached CLI login when present.
  • max_length (int, default 512): Token sequence length for tokenization.
  • lambda_constant (float, default 2): Weight on the auxiliary (BCE) loss relative to MLM loss. Total loss = MLM + λ · BCE.
  • mlm_probability (float, default 0.15): Token masking probability for MLM.
  • val_fraction (float, default 1/8): Fraction of df held out for validation during training.
  • weight (torch.Tensor | None, default None): Optional pos_weight for BCEWithLogitsLoss to handle class imbalance. Useful for rare outcomes (e.g., torch.tensor([20.0]) for ~5% positive prevalence).
  • training_configs (dict | None, default None): Any keyword arguments accepted by transformers.TrainingArguments. User-provided values override the defaults below. Default training_configs is {"num_train_epochs": 5, "per_device_train_batch_size": 24, "per_device_eval_batch_size": 24, "learning_rate": 1e-5, "warmup_ratio": 0.06, "weight_decay": 1e-3, "logging_steps": 1000, "save_strategy": "epoch", "seed": 42, "report_to": "none"}.

Returns

str — the output_dir path. After training, this directory contains:

  • pytorch_model.bin (or model.safetensors) — model weights
  • config.json — model architecture config
  • tokenizer.json, vocab.txt, tokenizer_config.json, special_tokens_map.json — tokenizer
  • joint_metadata.json — records outcome_col, text_col, max_length, base_model, lambda_constant, num_tasks (always 1), and workflow so inference can recover them automatically
  • checkpoint-* — per-epoch training checkpoints (can be deleted after training)
  • logs/ — TensorBoard-compatible training logs

get_outcome_score

Score a text scenario (or list of scenarios) against the single auxiliary head of a joint-finetuned model.

Example

get_outcome_score(
    model_name="DVT_model",
    text="83-year-old male, ASA 4, scheduled for CABG. PMH: COPD, diabetes.",
)

Parameters

  • model_name (str, required): Path to a directory saved by joint_finetune.
  • text (str | list[str], required): One scenario string, or a list of them. Determines the shape of the return value.
  • max_length (int | None, default: None): Token sequence length. Defaults to the value used during fine-tuning, recovered from joint_metadata.json, otherwise 512.
  • device (str | None, default: None): "cuda", "cpu", or None to auto-detect.
  • hf_token (str | None, default: None): Optional HuggingFace token for gated/private models.

Returns

  • float when text is a string — the predicted probability for the trained outcome, in [0, 1].
  • list[float] when text is a list — one probability per input, in the same order.

Multi-task finetuning

Multi-Task Learning (MTL) allows you to train a single versatile model capable of predicting multiple postoperative outcomes from the same clinical notes. Unlike traditional finetuning strategies — where you'd need to train a single model for each outcome — MTL allows you to create a model capable of simultaneously predicting multiple risks — analogous to foundation models.

description of MTL

MultiTaskLearningPrediction

Performs MTL finetuning.

Example

mtl_finetune(
    df,
    text_col="clincal_notes",
    outcome_cols=["death_30d", "dvt", "pneumonia", "aki", "AUR", "PE"],
    output_dir="my_run",
    training_configs={
        "num_train_epochs": 3,
        "per_device_train_batch_size": 16,
        "evaluation_strategy": "steps",
        "eval_steps": 100,
        "logging_steps": 100,     
        "learning_rate": 2e-5
        }
)

Fine-tune Bio+ClinicalBERT on MLM jointly with one binary classification head per outcome.

Parameters

  • df (pandas.DataFrame, required): Must contain text_col and all outcome_cols.
  • text_col (str, required): Name of the free-text column.
  • outcome_cols (list[str], required): Names of binary (0/1) outcome columns. One auxiliary head is trained per outcome. Rows with NaN in a given outcome are dropped for that outcome's task but used for the others.
  • output_dir (str, default "mtl_finetuned"): Directory to save the fine-tuned model, tokenizer, and metadata. Also used as the HuggingFace Trainer output_dir for checkpoints and logs.
  • base_model (str, default "emilyalsentzer/Bio_ClinicalBERT"): HuggingFace model id to start from. Any BERT-architecture model should work.
  • max_length (int, default 512): Token sequence length for tokenization.
  • lambda_constant (float, default 2): Weight on the auxiliary (per-outcome BCE) loss relative to MLM loss. Total loss = MLM + λ · mean(per-task BCE).
  • val_fraction (float, default 1/8): Fraction of df held out for validation during training.
  • training_configs (dict | None, default None): Any keyword arguments accepted by transformers.TrainingArguments. User-provided values override the defaults below. Default training_configs is {"num_train_epochs": 5, "per_device_train_batch_size": 24, "per_device_eval_batch_size": 24, "learning_rate": 1e-5, "warmup_ratio": 0.06, "weight_decay": 1e-3, "logging_steps": 1000 "save_strategy": "epoch", "seed": 42,}

Returns

str — the output_dir path. After training, this directory contains:

  • pytorch_model.bin (or model.safetensors) — model weights
  • config.json — model architecture config
  • tokenizer.json, vocab.txt, tokenizer_config.json, special_tokens_map.json — tokenizer
  • mtl_metadata.json — records outcome_cols, text_col, max_length, base_model, lambda_constant, num_tasks so inference can recover them automatically
  • checkpoint-* — per-epoch training checkpoints (can be deleted after training)
  • logs/ — TensorBoard-compatible training logs

get_postoperative_outcome_scores

Score a text scenario (or list of scenarios) against each auxiliary head of a fine-tuned MTL model.

Example

get_postoperative_outcome_scores(
    model_name,
    text,
    outcomes=["death_30d", "dvt", "pneumonia", "aki", "AUR", "PE"],
)

Parameters

  • model_name (str, required): Path to a directory saved by mtl_finetune.
  • text (str | list[str], required): One scenario string, or a list of them. Determines the shape of the return value.
  • outcomes (list[str] | None, default: None): Which outcomes to score. Defaults to all outcomes the model was trained on, recovered from mtl_metadata.json. Pass a subset to score only some. Names must match those used in mtl_finetune.
  • max_length (int | None, default: None): Token sequence length. Defaults to the value used during fine-tuning, recovered from metadata, otherwise 512.
  • device (str | None, default: None): "cuda", "cpu", or None to auto-detect.

Returns

  • dict[str, float] when text is a string — maps each outcome name to a probability in [0, 1].
  • list[dict[str, float]] when text is a list — one dict per input, in the same order.

get_pseudo_data

Generate a small synthetic dataset of preoperative clinical notes with binary outcomes for testing and demonstration. Outcomes are not random — each is driven by realistic feature combinations in the note (procedure type, age, ASA class, comorbidities), so a fine-tuned model is expected to learn meaningful associations.

Example

df = get_pseudo_data()
print(df.shape)                  # (1000, 5)
print(df.columns.tolist())       # ['text', 'Outcome_1', 'Outcome_2', 'Outcome_3', 'Outcome_4']

Parameters

None.

Returns

pandas.DataFrame with 1000 rows and 5 columns:

  • text (str) — synthetic preoperative note.
  • Outcome_1 to Outcome_4 (int, 0/1) — binary outcomes driven by clinical features in the note.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clinicalplan-0.1.4.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clinicalplan-0.1.4-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file clinicalplan-0.1.4.tar.gz.

File metadata

  • Download URL: clinicalplan-0.1.4.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for clinicalplan-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d1704f829aece7b763a561483e32bbc04b0b75331e5d81934f3185897a6ff729
MD5 523a385ae2a828314ddeba8aea09ba2f
BLAKE2b-256 29feaaeb221ceb94b55650d96fa76ef7675af49feff254cfb4c6560d862f1ecb

See more details on using hashes here.

File details

Details for the file clinicalplan-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: clinicalplan-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for clinicalplan-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5295af0d75065afcf8c3381630f2ca6f5fb973a36b2a26ba100b52a0818aa1c6
MD5 da0b3e9ad2f9e6d9ed8c209c091e1d17
BLAKE2b-256 371ff25c8a1c8ff63204946f45e573a43ad37991d6e18c9e65b64a2f0ede1de1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page