Skip to main content

A package for clinical AI pipeline development

Project description

medpipe

GitHub License Python 3.12+ tests

Table of content

  1. Overview
  2. Installation
  3. Usage
    1. Preprocessing operations
    2. Models
    3. Recalibration
    4. Metrics
    5. Plots
  4. Examples

Overview

The medpipe package is a layer to help create AI models for clinical applications from tabular data. It covers data loading and preprocessing, model creation and training, recalibration, and visualisation.


Installation

To install medpipe clone the GitHub repository and install the package with pip:

$ git clone git@github.com:Surgical-Recovery-and-Safety-Lab/medpipe.git
$ cd medpipe
$ pip install .

NOTE: It is recommended to use a virtual environment (venv) to install this package.

Ensure that the installation was succesfull and that all tests pass by running the following command in the medpipe directory:

$ pytest 

Usage

This package was tested on a Linux distribution (Ubuntu 24.04) with Python v3.12.3. The sckit-learn was used as the base of most of the code.

A Pipeline contains the preprocessing operations, a model for each prediction label, and a recalibration model (if specified) for each label. Thus, with only a few lines of code, several models can be created from the same data and fitted.

Preprocessing operations

Currently four preprocessing operations are available:

  • standarise, this operation standardises the input features by removing the mean and scaling to unit variance;
  • ordinal encoding, this operation converts non-numerical categorical input features into ordinal ones;
  • power transform, this operation applies a power transform to make the data more Gaussian-like;
  • binning, this operation converts a continuous input feature into bins and caps the value.

Models

There is only one classifier available at the moment: the histogram boosted gradient classifier.

NOTE: Adding a new model only requires editing the create_model function in models/core. To work, it must have a fit and predict method.

Recalibration

Two recalibration models are available: logistic regression, and isotonic regression.

Metrics

The available metrics are divided into the score metrics and prediction metrics. The list of available metrics is the following:

Metric Type Description
Accuracy Prediction Proportion of all classifications that were correct.
Recall Prediction Proportion of all actual positives that were classified correctly (true positive rate).
Precision Prediction Proportion of all the positive classifications that are actually positive.
F1 score Prediction Harmonic mean of precision and recall.
AUROC Score Area under the ROC curve.
AP Score Area under the precision-recall curve.
Log loss Score Logarithmic loss.

Plots

Three types of plots are available: bar graphs for the metrics, predicted probability distributions, and calibration curves.

The following graphs are from one pipeline with two models, one to predict complications and the other to predict 90-day mortality. The predictor and calibrator results are plotted on the same graphs to compare the effect of recalibration.

Plots of the AUROC and log loss metric values with confidence intervals for each outcome:

Any complication 90-day mortality
AUROC_any_comp AUROC_90d_mortality
log_loss_any_comp log_loss_90d_mortality

Predicted probability distributions for each outcome:

Any complication 90-day mortality
proba_dist_any_comp proba_dist_90d_mortality

Calibration curves for each outcome:

Any complication 90-day mortality
calibration_curve_any_comp proba_dist_90d_mortality

Example

Here is a short example that shows how to load data, train the models, and plot the calibration curves:

from medpipe import (
	Pipeline
	read_toml_configuration,
	load_data_from_csv,
	get_positive_proba,
	extract_labels,
	plot_reliability_diagrams,
)

# Load configuration and data
config = read_toml_configuration("config_file.toml")
data = load_data_from_csv("data.csv")

# Create pipeline
pipeline = Pipeline(general_config)

# Split data into sets and train model
X_train, X_test = pipeline.get_test_data(data)
pipeline.run(X_train)

# Plot calibration curve
X_test, y_test = extract_labels(X_test, pipeline.label_list)
y_pred_proba = pipeline.predict_proba(X_test)
plot_reliability_diagrams(y_test, get_positive_proba(y_pred_proba, display_kwargs={"n_bins": 10, "strategy": "quantile"})

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medpipe-0.0.1.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medpipe-0.0.1-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file medpipe-0.0.1.tar.gz.

File metadata

  • Download URL: medpipe-0.0.1.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for medpipe-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ea408a1abbd4f2a0cdc4af7769163f5aba26d6cce3644ad7255feca214d8c34c
MD5 1c804ba1386192108a55e292f3ed5d61
BLAKE2b-256 0332f5d4d4ce6e2c11e990c62ed66bd73318809a63e924a4662ca4906419cd51

See more details on using hashes here.

File details

Details for the file medpipe-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: medpipe-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for medpipe-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 37e4bca257132676e2fe4df73cba644885762dcea1f2449c95632dcddaaeb3e8
MD5 bab156dc3d8041b5c0b44e0c914eb1ce
BLAKE2b-256 d4066c989a7510773ca1be262aea84c8ec15c5835ba3ecf8c1a2495e39ec14b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page