Skip to main content

A Python library for healthcare AI

Project description

Docs Discord Mailing list PyPI version GitHub stars GitHub forks Downloads Tutorials YouTube CI status

Citing PyHealth :handshake:

Yang, Chaoqi, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. 2023. “PyHealth: A Deep Learning Toolkit for Healthcare Applications.” In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5788–89. KDD ‘23. New York, NY, USA: Association for Computing Machinery.

@inproceedings{pyhealth2023yang,
    author = {Yang, Chaoqi and Wu, Zhenbang and Jiang, Patrick and Lin, Zhen and Gao, Junyi and Danek, Benjamin and Sun, Jimeng},
    title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
    url = {https://github.com/sunlabuiuc/PyHealth},
    booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023},
    year = {2023}
}

PyHealth is a comprehensive deep learning toolkit for supporting clinical predictive modeling, which is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to develop, test, and deploy—more flexible and more customizable. [Tutorials]

Key Features

  • Modular 5-stage pipeline for healthcare ML

  • Healthcare-first: medical codes and clinical datasets (MIMIC, eICU, OMOP)

  • 33+ pre-built models and production-ready trainer/metrics

  • 10+ supported healthcare tasks and datasets

  • Fast (~3x faster than pandas) data processing for quick experimentation

[News!] We are continuously implementing good papers and benchmarks into PyHealth, checkout the [Planned List]. Welcome to pick one from the list and send us a PR or add more influential and new papers into the plan list.

figure/poster.png

1. Installation :rocket:

Python Version Requirement

PyHealth 2.0 requires Python 3.12 or 3.13 (>=3.12,<3.14). This version requirement enables optimal parallel processing, memory management, and compatibility with our modern dependencies.

Recommended Installation (Latest Release)

Install the latest PyHealth 2.0 release from PyPI:

pip install pyhealth

This version includes significant performance improvements, dynamic memory support, parallelized processing, multimodal dataloaders, and many new features.

Legacy Version

The older stable version (1.16) is still available for backward compatibility and supports Python 3.9+:

pip install pyhealth==1.16

For Contributors and Developers

If you are contributing to PyHealth or need the latest development features, install from GitHub source:

git clone https://github.com/sunlabuiuc/PyHealth.git
cd PyHealth
pip install -e .

Note: PyHealth 2.0 automatically installs PyTorch and other deep learning dependencies. The alpha version includes all required libraries for neural network-based models.

2. Introduction :book:

pyhealth provides these functionalities (we are still enriching some modules):

figure/overview.png

You can use the following functions independently:

  • Dataset: MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, EHRShot, COVID19-CXR, SleepEDF, SHHS, ISRUC, customized EHR datasets, etc.

  • Tasks: diagnosis-based drug recommendation, patient hospitalization and mortality prediction, readmission prediction, length of stay forecasting, sleep staging, etc.

  • ML models: RNN, LSTM, GRU, Transformer, RETAIN, SafeDrug, GAMENet, MoleRec, AdaCare, ConCare, StageNet, GRASP, SparcNet, ContraWR, Deepr, TCN, Dr. Agent, etc.

Building a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.

3. Build ML Pipelines :trophy:

All healthcare tasks in our package follow a five-stage pipeline:

figure/five-stage-pipeline.png

We try hard to make sure each stage is as separate as possible, so that people can customize their own pipeline by only using our data processing steps or the ML models.

Module 1: <pyhealth.datasets>

pyhealth.datasets provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, and more. The output (mimic3base) is a multi-level dictionary structure (see illustration below).

from pyhealth.datasets import MIMIC3Dataset

mimic3base = MIMIC3Dataset(
    # root directory of the dataset
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    # raw CSV table name
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    # map all NDC codes to CCS codes in these tables
    code_mapping={"NDC": "CCSCM"},
)
figure/structured-dataset.png

Module 2: <pyhealth.tasks>

pyhealth.tasks defines how to process each patient’s data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation, mortality prediction, and readmission prediction. It is easy to customize your own tasks following our template.

from pyhealth.tasks import ReadmissionPredictionMIMIC3

mimic3sample = mimic3base.set_task(ReadmissionPredictionMIMIC3())
mimic3sample[0] # show the information of the first sample

from pyhealth.datasets import split_by_patient, get_dataloader

train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

Module 3: <pyhealth.models>

pyhealth.models provides different ML models with very similar argument configs.

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
)

Module 4: <pyhealth.trainer>

pyhealth.trainer can specify training arguments, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc_samples",
)

Module 5: <pyhealth.metrics>

pyhealth.metrics provides several common evaluation metrics (refer to Doc and see what are available).

# method 1
trainer.evaluate(test_loader)

# method 2
from pyhealth.metrics.binary import binary_metrics_fn

y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])

4. Medical Code Map :hospital:

pyhealth.codemap provides two core functionalities. This module can be used independently.

  • For code ontology lookup within one medical coding system (e.g., name, category, sub-concept);

from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# `Congestive heart failure, unspecified`
icd9cm.get_ancestors("428.0")
# ['428', '420-429.99', '390-459.99', '001-999.99']

atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# `ibuprofen, combinations`
atc.lookup("M01AE51", "drugbank_id")
# `DB01050`
atc.lookup("M01AE51", "description")
# Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) derived ...
atc.lookup("M01AE51", "indication")
# Ibuprofen is the most commonly used and prescribed NSAID. It is very common over the ...
  • For code mapping between two coding systems (e.g., ICD9CM to CCSCM).

from pyhealth.medcode import CrossMap

codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("428.0")
# ['108']

codemap = CrossMap.load("NDC", "RxNorm")
codemap.map("50580049698")
# ['209387']

codemap = CrossMap.load("NDC", "ATC")
codemap.map("50090539100")
# ['A10AC04', 'A10AD04', 'A10AB04']

5. Medical Code Tokenizer :speech_balloon:

pyhealth.tokenizer is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used independently.

from pyhealth.tokenizer import Tokenizer

# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
        'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
        'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])

# 2d encode
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens)
# [[8, 9, 10, 11], [12, 1, 1, 0]]

# 2d decode
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices)
# [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]

# 3d encode
tokens = [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], \
    [['A04A', 'B035', 'C129']]]
indices = tokenizer.batch_encode_3d(tokens)
# [[[8, 9, 10, 11], [24, 25, 0, 0]], [[12, 1, 1, 0], [0, 0, 0, 0]]]

# 3d decode
indices = [[[8, 9, 10, 11], [24, 25, 0, 0]], \
    [[12, 1, 1, 0], [0, 0, 0, 0]]]
tokens = tokenizer.batch_decode_3d(indices)
# [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], [['A04A', '<unk>', '<unk>']]]

6. Tutorials :teacher:

https://colab.research.google.com/assets/colab-badge.svg

We provide the following tutorials to help users get started with our pyhealth. Please bear with us as we update the documentation on how to use PyHealth 2.0.

Tutorial 0: Introduction to pyhealth.data [Video]

Tutorial 1: Introduction to pyhealth.datasets [Video (PyHealth 1.16)]

Tutorial 2: Introduction to pyhealth.tasks [Video (PyHealth 1.16)]

Tutorial 3: Introduction to pyhealth.models [Video]

Tutorial 4: Introduction to pyhealth.trainer [Video]

Tutorial 5: Introduction to pyhealth.metrics [Video]

Tutorial 6: Introduction to pyhealth.tokenizer [Video]

Tutorial 7: Introduction to pyhealth.medcode [Video]

The following tutorials will help users build their own task pipelines.

Pipeline 1: Chest Xray Classification [Video]

Pipeline 2: Medical Coding

Pipeline 3: Medical Transcription Classification

Pipeline 4: Mortality Prediction

Pipeline 5: Readmission Prediction

We provide advanced tutorials for supporting various needs.

Advanced Tutorial 1: Fit your dataset into our pipeline [Video]

Advanced Tutorial 2: Define your own healthcare task

Advanced Tutorial 3: Adopt customized model into pyhealth [Video]

Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models [Video]

7. Datasets :mountain_snow:

We provide the processing files for the following open EHR datasets:

MIMIC-III

pyhealth.datasets.MIMIC3Dataset

2016

MIMIC-III Clinical Database

MIMIC-IV

pyhealth.datasets.MIMIC4Dataset

2020

MIMIC-IV Clinical Database

eICU

pyhealth.datasets.eICUDataset

2018

eICU Collaborative Research Database

OMOP

pyhealth.datasets.OMOPDataset

OMOP-CDM schema based dataset

EHRShot

pyhealth.datasets.EHRShotDataset

2023

Few-shot EHR benchmarking dataset

COVID19-CXR

pyhealth.datasets.COVID19CXRDataset

2020

COVID-19 chest X-ray image dataset

SleepEDF

pyhealth.datasets.SleepEDFDataset

2018

Sleep-EDF dataset

SHHS

pyhealth.datasets.SHHSDataset

2016

Sleep Heart Health Study dataset

ISRUC

pyhealth.datasets.ISRUCDataset

2016

ISRUC-SLEEP dataset

8. Machine/Deep Learning Models :airplane:

Deep Learning Models

Model

Year

Key Innovation

RETAIN

2016

Interpretable attention for clinical decisions

GAMENet

2019

Memory networks for drug recommendation

SafeDrug

2021

Molecular graphs for safe drug combinations

MoleRec

2023

Substructure-aware drug recommendation

AdaCare

2020

Scale-adaptive feature extraction

ConCare

2020

Transformer-based patient modeling

StageNet

2020

Disease progression stage modeling

GRASP

2021

Graph neural networks for patient clustering

MICRON

2021

Medication change prediction with recurrent residual networks

Foundation Models

Model

Year

Description

Transformer

2017

Attention-based sequence modeling

RNN/LSTM/GRU

2011

Recurrent neural networks for sequences

CNN

1989

Convolutional networks for structured data

TCN

2018

Temporal convolutional networks

MLP

1986

Multi-layer perceptrons for tabular data

Specialized Models

Model

Year

Specialization

ContraWR

2021

Biosignal analysis (EEG, ECG)

SparcNet

2023

Seizure detection and sleep staging

Deepr

2017

Electronic health records

Dr. Agent

2020

Reinforcement learning for clinical decisions

9. Research Initiative :microscope:

The PyHealth Research Initiative is a year-round, open research program that brings together talented individuals from diverse backgrounds to conduct cutting-edge research in healthcare AI.

How to participate:

  1. Join our Discord server

  2. Submit a high-quality PR to the PyHealth repository

  3. Check the documentation for more details

Recent research from the initiative has been published at venues including ML4H 2025 and other top conferences.

10. About Us :busts_in_silhouette:

We are the SunLab healthcare research team at UIUC.

Current Maintainers:

Get in Touch:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhealth-2.0.0-py3-none-any.whl (514.9 kB view details)

Uploaded Python 3

File details

Details for the file pyhealth-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyhealth-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 514.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyhealth-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2744038bf94b9b176a5b38d9231271a71df0c06c6b2498f7e0e9cea9239ac9ca
MD5 316a78c95fccb397afce93f2d5b8e2dc
BLAKE2b-256 85a360ec67ce1721e88ca54e9144968ac3e98f0570751db2660e0769a50acdb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page