A Python library for healthcare AI
Project description
Citing PyHealth :handshake:
Yang, Chaoqi, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. 2023. “PyHealth: A Deep Learning Toolkit for Healthcare Applications.” In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5788–89. KDD ‘23. New York, NY, USA: Association for Computing Machinery.
@inproceedings{pyhealth2023yang,
author = {Yang, Chaoqi and Wu, Zhenbang and Jiang, Patrick and Lin, Zhen and Gao, Junyi and Danek, Benjamin and Sun, Jimeng},
title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
url = {https://github.com/sunlabuiuc/PyHealth},
booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023},
year = {2023}
}
PyHealth is a comprehensive deep learning toolkit for supporting clinical predictive modeling, which is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to develop, test, and deploy—more flexible and more customizable. [Tutorials]
Key Features
Modular 5-stage pipeline for healthcare ML
Healthcare-first: medical codes and clinical datasets (MIMIC, eICU, OMOP)
33+ pre-built models and production-ready trainer/metrics
10+ supported healthcare tasks and datasets
Fast (~3x faster than pandas) data processing for quick experimentation
[News!] We are continuously implementing good papers and benchmarks into PyHealth, checkout the [Planned List]. Welcome to pick one from the list and send us a PR or add more influential and new papers into the plan list.
1. Installation :rocket:
Python Version Requirement
PyHealth 2.0 requires Python 3.12 or 3.13 (>=3.12,<3.14). This version requirement enables optimal parallel processing, memory management, and compatibility with our modern dependencies.
Recommended Installation (Latest Release)
Install the latest PyHealth 2.0 release from PyPI:
pip install pyhealth
This version includes significant performance improvements, dynamic memory support, parallelized processing, multimodal dataloaders, and many new features.
Legacy Version
The older stable version (1.16) is still available for backward compatibility and supports Python 3.9+:
pip install pyhealth==1.16
For Contributors and Developers
If you are contributing to PyHealth or need the latest development features, install from GitHub source:
git clone https://github.com/sunlabuiuc/PyHealth.git
cd PyHealth
pip install -e .
Note: PyHealth 2.0 automatically installs PyTorch and other deep learning dependencies. The alpha version includes all required libraries for neural network-based models.
2. Introduction :book:
pyhealth provides these functionalities (we are still enriching some modules):
You can use the following functions independently:
Dataset: MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, EHRShot, COVID19-CXR, SleepEDF, SHHS, ISRUC, customized EHR datasets, etc.
Tasks: diagnosis-based drug recommendation, patient hospitalization and mortality prediction, readmission prediction, length of stay forecasting, sleep staging, etc.
ML models: RNN, LSTM, GRU, Transformer, RETAIN, SafeDrug, GAMENet, MoleRec, AdaCare, ConCare, StageNet, GRASP, SparcNet, ContraWR, Deepr, TCN, Dr. Agent, etc.
Building a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.
3. Build ML Pipelines :trophy:
All healthcare tasks in our package follow a five-stage pipeline:
We try hard to make sure each stage is as separate as possible, so that people can customize their own pipeline by only using our data processing steps or the ML models.
Module 1: <pyhealth.datasets>
pyhealth.datasets provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, and more. The output (mimic3base) is a multi-level dictionary structure (see illustration below).
from pyhealth.datasets import MIMIC3Dataset
mimic3base = MIMIC3Dataset(
# root directory of the dataset
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
# raw CSV table name
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
# map all NDC codes to CCS codes in these tables
code_mapping={"NDC": "CCSCM"},
)
Module 2: <pyhealth.tasks>
pyhealth.tasks defines how to process each patient’s data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation, mortality prediction, and readmission prediction. It is easy to customize your own tasks following our template.
from pyhealth.tasks import ReadmissionPredictionMIMIC3
mimic3sample = mimic3base.set_task(ReadmissionPredictionMIMIC3())
mimic3sample[0] # show the information of the first sample
from pyhealth.datasets import split_by_patient, get_dataloader
train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)
Module 3: <pyhealth.models>
pyhealth.models provides different ML models with very similar argument configs.
from pyhealth.models import Transformer
model = Transformer(
dataset=mimic3sample,
)
Module 4: <pyhealth.trainer>
pyhealth.trainer can specify training arguments, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.
from pyhealth.trainer import Trainer
trainer = Trainer(model=model)
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc_samples",
)
Module 5: <pyhealth.metrics>
pyhealth.metrics provides several common evaluation metrics (refer to Doc and see what are available).
# method 1
trainer.evaluate(test_loader)
# method 2
from pyhealth.metrics.binary import binary_metrics_fn
y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])
4. Medical Code Map :hospital:
pyhealth.codemap provides two core functionalities. This module can be used independently.
For code ontology lookup within one medical coding system (e.g., name, category, sub-concept);
from pyhealth.medcode import InnerMap
icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# `Congestive heart failure, unspecified`
icd9cm.get_ancestors("428.0")
# ['428', '420-429.99', '390-459.99', '001-999.99']
atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# `ibuprofen, combinations`
atc.lookup("M01AE51", "drugbank_id")
# `DB01050`
atc.lookup("M01AE51", "description")
# Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) derived ...
atc.lookup("M01AE51", "indication")
# Ibuprofen is the most commonly used and prescribed NSAID. It is very common over the ...
For code mapping between two coding systems (e.g., ICD9CM to CCSCM).
from pyhealth.medcode import CrossMap
codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("428.0")
# ['108']
codemap = CrossMap.load("NDC", "RxNorm")
codemap.map("50580049698")
# ['209387']
codemap = CrossMap.load("NDC", "ATC")
codemap.map("50090539100")
# ['A10AC04', 'A10AD04', 'A10AB04']
5. Medical Code Tokenizer :speech_balloon:
pyhealth.tokenizer is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used independently.
from pyhealth.tokenizer import Tokenizer
# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])
# 2d encode
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens)
# [[8, 9, 10, 11], [12, 1, 1, 0]]
# 2d decode
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices)
# [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]
# 3d encode
tokens = [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], \
[['A04A', 'B035', 'C129']]]
indices = tokenizer.batch_encode_3d(tokens)
# [[[8, 9, 10, 11], [24, 25, 0, 0]], [[12, 1, 1, 0], [0, 0, 0, 0]]]
# 3d decode
indices = [[[8, 9, 10, 11], [24, 25, 0, 0]], \
[[12, 1, 1, 0], [0, 0, 0, 0]]]
tokens = tokenizer.batch_decode_3d(indices)
# [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], [['A04A', '<unk>', '<unk>']]]
6. Tutorials :teacher:
We provide the following tutorials to help users get started with our pyhealth. Please bear with us as we update the documentation on how to use PyHealth 2.0.
Tutorial 0: Introduction to pyhealth.data [Video]
Tutorial 1: Introduction to pyhealth.datasets [Video (PyHealth 1.16)]
Tutorial 2: Introduction to pyhealth.tasks [Video (PyHealth 1.16)]
Tutorial 3: Introduction to pyhealth.models [Video]
Tutorial 4: Introduction to pyhealth.trainer [Video]
Tutorial 5: Introduction to pyhealth.metrics [Video]
Tutorial 6: Introduction to pyhealth.tokenizer [Video]
Tutorial 7: Introduction to pyhealth.medcode [Video]
The following tutorials will help users build their own task pipelines.
Pipeline 1: Chest Xray Classification [Video]
Pipeline 3: Medical Transcription Classification
Pipeline 4: Mortality Prediction
Pipeline 5: Readmission Prediction
We provide advanced tutorials for supporting various needs.
Advanced Tutorial 1: Fit your dataset into our pipeline [Video]
Advanced Tutorial 2: Define your own healthcare task
Advanced Tutorial 3: Adopt customized model into pyhealth [Video]
Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models [Video]
7. Datasets :mountain_snow:
We provide the processing files for the following open EHR datasets:
MIMIC-III |
pyhealth.datasets.MIMIC3Dataset |
2016 |
|
MIMIC-IV |
pyhealth.datasets.MIMIC4Dataset |
2020 |
|
eICU |
pyhealth.datasets.eICUDataset |
2018 |
|
OMOP |
pyhealth.datasets.OMOPDataset |
||
EHRShot |
pyhealth.datasets.EHRShotDataset |
2023 |
|
COVID19-CXR |
pyhealth.datasets.COVID19CXRDataset |
2020 |
COVID-19 chest X-ray image dataset |
SleepEDF |
pyhealth.datasets.SleepEDFDataset |
2018 |
|
SHHS |
pyhealth.datasets.SHHSDataset |
2016 |
|
ISRUC |
pyhealth.datasets.ISRUCDataset |
2016 |
8. Machine/Deep Learning Models :airplane:
Deep Learning Models
Model |
Year |
Key Innovation |
|---|---|---|
RETAIN |
2016 |
Interpretable attention for clinical decisions |
GAMENet |
2019 |
Memory networks for drug recommendation |
SafeDrug |
2021 |
Molecular graphs for safe drug combinations |
MoleRec |
2023 |
Substructure-aware drug recommendation |
AdaCare |
2020 |
Scale-adaptive feature extraction |
ConCare |
2020 |
Transformer-based patient modeling |
StageNet |
2020 |
Disease progression stage modeling |
GRASP |
2021 |
Graph neural networks for patient clustering |
MICRON |
2021 |
Medication change prediction with recurrent residual networks |
Foundation Models
Model |
Year |
Description |
|---|---|---|
Transformer |
2017 |
Attention-based sequence modeling |
RNN/LSTM/GRU |
2011 |
Recurrent neural networks for sequences |
CNN |
1989 |
Convolutional networks for structured data |
TCN |
2018 |
Temporal convolutional networks |
MLP |
1986 |
Multi-layer perceptrons for tabular data |
Specialized Models
Model |
Year |
Specialization |
|---|---|---|
ContraWR |
2021 |
Biosignal analysis (EEG, ECG) |
SparcNet |
2023 |
Seizure detection and sleep staging |
Deepr |
2017 |
Electronic health records |
Dr. Agent |
2020 |
Reinforcement learning for clinical decisions |
9. Research Initiative :microscope:
The PyHealth Research Initiative is a year-round, open research program that brings together talented individuals from diverse backgrounds to conduct cutting-edge research in healthcare AI.
How to participate:
Join our Discord server
Submit a high-quality PR to the PyHealth repository
Check the documentation for more details
Recent research from the initiative has been published at venues including ML4H 2025 and other top conferences.
10. About Us :busts_in_silhouette:
We are the SunLab healthcare research team at UIUC.
Current Maintainers:
Zhenbang Wu (Ph.D. Student @ UIUC)
John Wu (Ph.D. Student @ UIUC)
Junyi Gao (Ph.D. Student @ University of Edinburgh)
Jimeng Sun (Professor @ UIUC)
Get in Touch:
Discord Community (fastest response)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyhealth-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pyhealth-2.0.0-py3-none-any.whl
- Upload date:
- Size: 514.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2744038bf94b9b176a5b38d9231271a71df0c06c6b2498f7e0e9cea9239ac9ca
|
|
| MD5 |
316a78c95fccb397afce93f2d5b8e2dc
|
|
| BLAKE2b-256 |
85a360ec67ce1721e88ca54e9144968ac3e98f0570751db2660e0769a50acdb3
|