A MEDS PyTorch Dataset, leveraging a on-the-fly retrieval strategy for flexible, efficient data loading.
Project description
MEDS-torch: Advanced Machine Learning for Electronic Health Records
🚀 Quick Start
Installation
pip install meds-torch
Set up environment variables
# Define data paths
PATHS_KWARGS="paths.data_dir=/CACHED/NESTED/RAGGED/TENSORS/DIR paths.meds_cohort_dir=/PATH/TO/MEDS/DATA/ paths.output_dir=/OUTPUT/RESULTS/DIRECTORY"
# Define task parameters (for supervised learning)
TASK_KWARGS="data.task_name=NAME_OF_TASK data.task_root_dir=/PATH/TO/TASK/LABELS/"
Basic Usage
- Train a supervised model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS $TASK_KWARGS
- Pretrain an autoregressive forecasting model (GPU)
meds-torch-train trainer=gpu $PATHS_KWARGS model=eic_forecasting
- Train with a specific experiment configuration
meds-torch-train experiment=experiment.yaml $PATHS_KWARGS $TASK_KWARGS hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS]
- Override parameters
meds-torch-train trainer.max_epochs=20 data.batch_size=64 $PATHS_KWARGS $TASK_KWARGS
- Hyperparameter search
meds-torch-tune experiment=experiment.yaml callbacks=tune_default $PATHS_KWARGS $TASK_KWARGS hparams_search=ray_tune hydra.searchpath=[pkg://meds_torch.configs,/PATH/TO/CUSTOM/CONFIGS]
Example Experiment Configuration
Here's a sample experiment.yaml
:
# @package _global_
defaults:
- override /data: pytorch_dataset
- override /logger: wandb
- override /model/backbone: triplet_transformer_encoder
- override /model/input_encoder: triplet_encoder
- override /model: supervised
- override /trainer: gpu
tags: [mimiciv, triplet, transformer_encoder]
seed: 0
trainer:
min_epochs: 1
max_epochs: 10
gradient_clip_val: 1.0
data:
dataloader:
batch_size: 64
num_workers: 6
max_seq_len: 128
collate_type: triplet
subsequence_sampling_strategy: to_end
model:
token_dim: 128
optimizer:
lr: 0.001
backbone:
n_layers: 2
nheads: 4
dropout: 0
logger:
wandb:
tags: ${tags}
group: mimiciv_tokenization
This configuration sets up a supervised learning experiment using a triplet transformer encoder on MIMIC-IV data. Modify this file to suit your specific needs.
🌟 Key Features
- Flexible ML Pipeline: Utilizes Hydra for dynamic configuration and PyTorch Lightning for scalable training.
- Advanced Tokenization: Supports multiple strategies for embedding EHR data (Triplet, Text Code, Everything In Code).
- Supervised Learning: Train models on arbitrary tasks defined in MEDS format data.
- Transfer Learning: Pretrain models using contrastive learning, forecasting, and other methods, then finetune for specific tasks.
- Multiple Pretraining Methods: Supports EBCL, OCP, STraTS Value Forecasting, and Autoregressive Observation Forecasting.
🛠 Installation
PyPI
pip install meds-torch
From Source
git clone git@github.com:Oufattole/meds-torch.git
cd meds-torch
pip install -e .
📚 Documentation
For detailed usage instructions, API reference, and examples, visit our documentation.
For a comprehensive demo of our pipeline and to see results from a suite of inductive experiments comparing different tokenization methods and learning approaches, please refer to the MIMICIV_INDUCTIVE_EXPERIMENTS/README.MD
file. This document provides detailed scripts and performance metrics.
🧪 Running Experiments
Supervised Learning
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_supervised.sh $MIMICIV_ROOT_DIR meds-torch
Transfer Learning
# Pretraining
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_multi_window_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_pretrain.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]
# Finetuning
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_finetune.sh $MIMICIV_ROOT_DIR meds-torch [METHOD]
bash MIMICIV_INDUCTIVE_EXPERIMENTS/launch_ar_finetune.sh $MIMICIV_ROOT_DIR meds-torch [AR_METHOD]
Replace [METHOD]
with one of the following:
ocp
(Observation Contrastive Pretraining)ebcl
(Event-Based Contrastive Learning)value_forecasting
(STraTS Value Forecasting)
Replace [AR_METHOD]
with one of the following:
eic_forecasting
(Everything In Code Forecasting)triplet_forecasting
(Triplet Forecasting)
These scripts allow you to run various experiments, including supervised learning, different pretraining methods, and finetuning for both standard and autoregressive models.
📞 Support
For questions, issues, or feature requests, please open an issue on our GitHub repository.
MEDS-torch: Advancing healthcare machine learning through flexible, robust, and scalable sequence modeling tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file meds_torch-0.0.4.tar.gz
.
File metadata
- Download URL: meds_torch-0.0.4.tar.gz
- Upload date:
- Size: 608.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7e954bb3bc7b8df169a77f8086ee2c7512a3cc741dc4a2c9dfb70e5539afcd1 |
|
MD5 | dd6a69e8e5a13198e2737d0d154a4d7a |
|
BLAKE2b-256 | 322b374008677b195e379d5bafe95b8b9abc185335626c6b9f5cae6a8914f9b0 |
File details
Details for the file meds_torch-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: meds_torch-0.0.4-py3-none-any.whl
- Upload date:
- Size: 113.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fe8f10ee1ebadb0f601408160f7b54fb389b6ac4a90a4787cd77ea6d4ca29ff |
|
MD5 | 0d1d0d38a961852c879e3bbd52c3f042 |
|
BLAKE2b-256 | bb61d168c253efacbea7221e01242730b6966c4dcfb2f988d1b6eadaa0f1c24e |