Skip to main content

A MEDS PyTorch Dataset, leveraging a on-the-fly retrieval strategy for flexible, efficient data loading.

Project description

MEDS-torch

PyTorch Lightning Config: Hydra Template
Python PyPI Hydra Tests Code Quality Contributors Pull Requests License

Description

This repository provides a flexible suite for advanced machine learning over Electronic Health Records (EHR) using PyTorch, PyTorch Lightning, and Hydra for configuration management. The project ingests tensorized data from the MEDS_transforms repository, a robust system for transforming EHR data into ML ready sequence data. By employing a variety of tokenization strategies and sequence model architectures, this framework facilitates the development and testing of models that can perform.

Key features include:

  • Configurable ML Pipeline: Utilize Hydra to dynamically adjust configurations and seamlessly integrate with PyTorch Lightning for scalable training across multiple environments.
  • Advanced Tokenization Techniques: Explore different approaches to embedding EHR data in tokens that sequence model can reason over.
  • Supervised Models: Support for supervised training on arbitrary tasks defined on MEDS format data.
  • Transfer Learning: Pretrain via contrastive learning, forecasting, and other pre-training methods, and finetune to supervised tasks.

The goal of this project is to push the boundaries of what's possible in healthcare machine learning by providing a flexible, robust, and scalable sequence model tools that accommodate a wide range of research and operational needs. Whether you're conducting academic research or developing clinical applications with MEDS format EHR data, this repository offers tools and flexibility to develop deep sequence models.

Installation

Pip

PyPi

pip install meds-torch

git

# clone project
git clone git@github.com:Oufattole/meds-torch.git
cd meds-torch

# [OPTIONAL] create conda environment
conda create -n meds-torch python=3.12
conda activate meds-torch

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -e .

How to run

Train model with default configuration

# train on CPU
python -m meds_torch.train trainer=cpu

# train on GPU
python -m meds_torch.train trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python -m meds_torch.train experiment=experiment_name.yaml

You can override any parameter from command line like this

python -m meds_torch.train trainer.max_epochs=20 data.batch_size=64

📌  Introduction

Why you might want to use it:

✅ Support different tokenization methods for EHR data

  • Triplet
  • Everything Is text
  • Everything Is a code

✅ MEDS data Supervised Learning and Transfer Learning Support

  • randomly initialize a model and train it in a supervised maner on your MEDS format medical data.
  • General Contrastive window Pretraining
  • Random EBCL Example
  • OCP Example
  • STraTS Value Forecasting

✅ Ease of Use and Reusability
Collection of useful EHR sequence modeling tools, configs, and code snippets. You can use this repo as a reference for developing your own models. Additionally you can easily add new models, datasets, tasks, experiments, and train on different accelerators, like multi-GPU.

Loggers

By default wandb logger is installed with the repo. Please install a different logger below if you wish to use it:

pip install neptune-client
pip install mlflow
pip install comet-ml
pip install aim>=3.16.2  # no lower than 3.16.2, see https://github.com/aimhubio/aim/issues/2550

Development Help

To run tests on 8 parallel workers run:

pytest -n 8

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meds_torch-0.0.1.tar.gz (502.1 kB view hashes)

Uploaded Source

Built Distribution

meds_torch-0.0.1-py3-none-any.whl (96.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page