A MEDS PyTorch Dataset, leveraging a on-the-fly retrieval strategy for flexible, efficient data loading.
Project description
MEDS-torch: Advanced Machine Learning for Electronic Health Records
🚀 Quick Start
Installation
pip install meds-torch
Set up environment variables
# Define data paths
PATHS_KWARGS="paths.data_dir=/CACHED/NESTED/RAGGED/TENSORS/DIR paths.meds_cohort_dir=/PATH/TO/MEDS/DATA/ paths.output_dir=/OUTPUT/RESULTS/DIRECTORY"
# Define task parameters (for supervised learning)
TASK_KWARGS="data.task_name=NAME_OF_TASK data.task_root_dir=/PATH/TO/TASK/LABELS/"
Basic Usage
- Train a supervised model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS $TASK_KWARGS
- Pretrain an autoregressive forecasting model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS model=eic_forecasting
- Train with a specific experiment configuration
meds-torch-train experiment=experiment.yaml $PATHS_KWARGS $TASK_KWARGS hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS]
- Override parameters
meds-torch-train trainer.max_epochs=20 data.batch_size=64 $PATHS_KWARGS $TASK_KWARGS
- Hyperparameter search
meds-torch-tune experiment=experiment.yaml callbacks=tune_default $PATHS_KWARGS $TASK_KWARGS hparams_search=ray_tune hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS]
Example Experiment Configuration
Here's a sample experiment.yaml
:
# @package _global_
defaults:
- override /data: pytorch_dataset
- override /logger: wandb
- override /model/backbone: triplet_transformer_encoder
- override /model/input_encoder: triplet_encoder
- override /model: supervised
- override /trainer: gpu
tags: [mimiciv, triplet, transformer_encoder]
seed: 0
trainer:
min_epochs: 1
max_epochs: 10
gradient_clip_val: 1.0
data:
dataloader:
batch_size: 64
num_workers: 6
max_seq_len: 128
collate_type: triplet
subsequence_sampling_strategy: to_end
model:
token_dim: 128
optimizer:
lr: 0.001
backbone:
n_layers: 2
nheads: 4
dropout: 0
logger:
wandb:
tags: ${tags}
group: mimiciv_tokenization
This configuration sets up a supervised learning experiment using a triplet transformer encoder on MIMIC-IV data. Modify this file to suit your specific needs.
🌟 Key Features
- Flexible ML Pipeline: Utilizes Hydra for dynamic configuration and PyTorch Lightning for scalable training.
- Advanced Tokenization: Supports multiple strategies for embedding EHR data (Triplet, Text Code, Everything In Code).
- Supervised Learning: Train models on arbitrary tasks defined in MEDS format data.
- Transfer Learning: Pretrain models using contrastive learning, forecasting, and other methods, then finetune for specific tasks.
- Multiple Pretraining Methods: Supports EBCL, OCP, STraTS Value Forecasting, and Autoregressive Observation Forecasting.
🛠 Installation
PyPI
pip install meds-torch
From Source
git clone git@github.com:Oufattole/meds-torch.git
cd meds-torch
pip install -e .
📚 Documentation
For detailed usage instructions, API reference, and examples, visit our documentation.
For a comprehensive demo of our pipeline and to see results from a suite of inductive experiments comparing different tokenization methods and learning approaches, please refer to the MIMICIV_INDUCTIVE_EXPERIMENTS/README.MD
file. This document provides detailed scripts and performance metrics.
🧪 Running Experiments
Supervised Learning
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_supervised.sh $MIMICIV_ROOT_DIR meds-torch
Transfer Learning
# Pretraining
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_multi_window_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]
# Finetuning
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_finetune.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_finetune.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]
Replace [METHOD]
with one of the following:
ocp
(Observation Contrastive Pretraining)ebcl
(Event-Based Contrastive Learning)value_forecasting
(STraTS Value Forecasting)
Replace [AR_METHOD]
with one of the following:
eic_forecasting
(Everything In Code Forecasting)triplet_forecasting
(Triplet Forecasting)
These scripts allow you to run various experiments, including supervised learning, different pretraining methods, and finetuning for both standard and autoregressive models.
📞 Support
For questions, issues, or feature requests, please open an issue on our GitHub repository.
MEDS-torch: Advancing healthcare machine learning through flexible, robust, and scalable sequence modeling tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for meds_torch-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fe8f10ee1ebadb0f601408160f7b54fb389b6ac4a90a4787cd77ea6d4ca29ff |
|
MD5 | 0d1d0d38a961852c879e3bbd52c3f042 |
|
BLAKE2b-256 | bb61d168c253efacbea7221e01242730b6966c4dcfb2f988d1b6eadaa0f1c24e |