Framework for Electronic Medical Records. A python package for building models using EHR data.

Project description

FEMR

Framework for Electronic Medical Records

FEMR is a Python package for manipulating longitudinal EHR data for machine learning, with a focus on supporting the creation of foundation models and verifying their presumed benefits in healthcare. Such a framework is needed given the current state of large language models in healthcare and the need for better evaluation frameworks.

The currently supported foundation models are CLMBR and MOTOR.

FEMR by default supports the OMOP Common Data Model developed by the OHDSI community, but can also be used with other forms of EHR / claims data with minimal processing. FEMR has been used to process data from a variety of sources, including MIMIC-IV, Optum, Truven, STARR-OMOP, and SickKids-OMOP.

FEMR helps users:

Manipulate events in the EHR data comprising a patient's timeline
Algorithmically label patient records based on structured data
Generate tabular features from patient timelines for use with traditional gradient boosted tree models
Train and finetune CLMBR-derived models for binary classification and prediction tasks.
Train and finetune MOTOR-derived models for making time-to-event predictions.

We recommend users start with our tutorial folder

Installation

There are two variants of the FEMR package, a CPU only version and a CUDA enabled version.

How to install FEMR without CUDA

pip install femr

If you have a particularly old CPU, we offer a variant of femr without CPU optimations.

pip install femr_oldcpu

How to install FEMR with CUDA support

Note that CUDA-enabled FEMR requires jax in order to function.

pip install --upgrade "jax[cuda11_pip]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install "femr_cuda[models]"

Getting Started

The first step of using FEMR is to convert your patient data into a femr.datasets.PatientDatabase, a file format that allows you to easily query patient timelines.

There are three options for doing this (in order from most to least recommended):

a) Convert your data to OMOP form and run the etl_generic_omop program to convert OMOP datasets to PatientDatabases. See our MIMIC OMOP ETL tutorial.

b) Convert your data to FEMR's custom simple csv format and run the etl_simple_femr program to convert that format into a PatientDatabase. See our simple format ETL tutorial.

c) Write a custom ETL script to handle special cases. See both the Stanford and Sickkid's ETL scripts.

Development

The following guides are for developers who want to contribute to FEMR.

Building from source

In some scenarios (such as contributing to FEMR), you might want to compile the package from source.

In order to do so, follow the following instructions.

conda create -n FEMR_ENV python=3.10 bazel=6 -c conda-forge -y
conda activate FEMR_ENV

export BAZEL_USE_CPP_ONLY_TOOLCHAIN=1

git clone https://github.com/som-shahlab/femr.git
cd femr
pip install -e .

Special note for NERO users

As Nero does not have internet access, you must run the following before running the code above.

export DISTDIR=/local-scratch/nigam/distdir

(Optional) Installing CUDA on Nero / Carina

As a side note for Nero/Carina users, do not use your home directory to save the femr repo and installation files due to limited storage. We recommend using the shared project folder, e.g., on nero, use '/local-scratch/nigam/project/...'

If you are using Nero, you will need to install CUDA manually until the CUDA version on Nero is updated. To do so, follow these steps:

Download version 11.8 of CUDA onto your local machine from here
Copy your CUDA download from your local machine onto Nero, into whatever folder you'd like. We'll refer to the path to this folder as <PATH_TO_CUDA_INSTALLER> from now on.
- Note: Nero doesn't work with scp. You can use an alternative like pscp, which functions basically identically to scp. You can install pscp on a Mac by using brew install putty.
ssh into Nero using ssh <username>@nero-nigam.compute.stanford.edu
On Nero, run the CUDA installer as a bash command as follows: bash <PATH_TO_CUDA_INSTALLER> --installpath=<INSTALL_PATH>, where <PATH_TO_CUDA_INSTALLER> is the path to the file you downloaded/transferred in Step #2, and <INSTALL_PATH> is where you'd like to save your CUDA installation files. We recommend using ~ or something similar.
The CUDA installer will pop-up a window during installation. Uncheck all of the boxes it presents except for the box labeled "cuda toolkit".
After the installation completes, the installer will print out two paths to your console. Take note of these paths, and copy them into your .bashrc file by running the following commands.
Install cuDNN v8.7.0 (November 28th, 2022) for CUDA. Go to this link and download the file Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x -> Local Installer for Linux x86_64 (Tar) on your local computer and transfer it over to your local folder in nero. Then follow the instruction here section 1.3. Note that you need to copy over cudnn files to your local cuda. For example,

cp cudnn-*-archive/include/cudnn*.h <path_to_your_cuda>/include
cp -P cudnn-*-archive/lib/libcudnn* <path_to_your_cuda>/lib64
chmod a+r <path_to_your_cuda>/include/cudnn*.h <path_to_your_cuda>/lib64/libcudnn*

Add the following to your .bashrc file. You may need to restart your terminal for the changes to be reflected.

export PATH="<INSTALL_PATH>/bin:$PATH"
export LD_LIBRARY_PATH="<INSTALL_PATH>/lib64:$LD_LIBRARY_PATH"

To write in a .bashrc file, use

nano ~/.bashrc

Run rm /tmp/cuda-installer.log to remove the installer log (if you don't do this, it will cause a segmentation fault for other users when they try to install CUDA).

Precommit checks

Before committing, please run the following commands to ensure that your code is formatted correctly and passes all tests.

Installation

conda install pre-commit pytest -y
pre-commit install

Running

Test Functions

pytest tests

Formatting Checks

pre-commit run --all-files

Miscellaneous

GZIP decompression commands

export OMOP_SOURCE=/share/pi/nigam...
gunzip $OMOP_SOURCE/**/*.csv.gz

Zstandard compression commands

export OMOP_SOURCE=/share/pi/nigam...
zstd -1 --rm $OMOP_SOURCE/**/*.csv

Generating extract

# Set up environment variables
#   Path to a folder containing your raw STARR-OMOP download, generated via `tools.stanford.download_bigquery.py`
export OMOP_SOURCE=/path/to/omop/folder...
#   Path to any arbitrary folder where you want to store your FEMR extract
export EXTRACT_DESTINATION=/path/to/femr/extract/folder...
#   Path to any arbitrary folder where you want to store your FEMR extract logs
export EXTRACT_LOGS=/path/to/femr/extract/logs...

# Do some data preprocessing with Stanford-specific helper scripts
#   Extract data from flowsheets
python tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"
#   Normalize visits
python tools/omop/normalize_visit_detail.py --num_threads 5 "${EXTRACT_DESTINATION}_flowsheets" "${EXTRACT_DESTINATION}_flowsheets_detail"

# Run actual FEMR extraction
etl_stanford_omop "${EXTRACT_DESTINATION}_flowsheets_detail" $EXTRACT_DESTINATION $EXTRACT_LOGS --num_threads 10

Example usage (Note: This should take ~10 minutes on a 1% extract of STARR-OMOP)

export OMOP_SOURCE=/local-scratch/nigam/projects/ethanid/som-rit-phi-starr-prod.starr_omop_cdm5_deid_1pcent_2022_11_09
export EXTRACT_DESTINATION=/local-scratch/nigam/projects/mwornow/femr_starr_omop_cdm5_deid_1pcent_2022_11_09
export EXTRACT_LOGS=/local-scratch/nigam/projects/mwornow/femr_starr_omop_cdm5_deid_1pcent_2022_11_09_logs

python tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"
python tools/omop/normalize_visit_detail.py --num_threads 5 "${EXTRACT_DESTINATION}_flowsheets" "${EXTRACT_DESTINATION}_flowsheets_detail"

etl_stanford_omop "${EXTRACT_DESTINATION}_flowsheets_detail" $EXTRACT_DESTINATION $EXTRACT_LOGS --num_threads 10

(Optional) Installing PyTorch

If you are on Nero, you need to install PyTorch using:

conda install numpy -y
pip install torch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu111

If you are on Carina, you need to install PyTorch using:

conda install numpy pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia -y

Project details

Release history Release notifications | RSS feed

0.1.16

Nov 3, 2023

0.1.15

Nov 3, 2023

0.1.14

Nov 3, 2023

0.1.13

Nov 3, 2023

0.1.12

Nov 3, 2023

0.1.11

Nov 3, 2023

0.1.10

Nov 2, 2023

This version

0.1.9

Jul 8, 2023

0.1.8

May 2, 2023

0.0.198

Jun 12, 2023

0.0.21

May 8, 2024

0.0.20

Jun 13, 2023

0.0.19

Jun 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

femr_oldcpu-0.1.9-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded Jul 8, 2023 PyPymanylinux: glibc 2.17+ x86-64

femr_oldcpu-0.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded Jul 8, 2023 CPython 3.11manylinux: glibc 2.17+ x86-64

femr_oldcpu-0.1.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded Jul 8, 2023 CPython 3.10manylinux: glibc 2.17+ x86-64

femr_oldcpu-0.1.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded Jul 8, 2023 CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file femr_oldcpu-0.1.9-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: femr_oldcpu-0.1.9-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 8, 2023
Size: 1.1 MB
Tags: PyPy, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for femr_oldcpu-0.1.9-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`617930c96a4a2b77ad80291a4e1fff7621127b123db21a05ebc432addce7eda6`
MD5	`50cd3c9a30899be3a81b6eedcd293f13`
BLAKE2b-256	`03881f2d222eb171672716ca49fe10a209cd25e416a7626b69190060d3a6fe49`

See more details on using hashes here.

File details

Details for the file femr_oldcpu-0.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: femr_oldcpu-0.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 8, 2023
Size: 1.1 MB
Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for femr_oldcpu-0.1.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`e2fe9372db11766aa92fac84087a307d71a1eeb1599dc74c9cc94544ea7e7eac`
MD5	`b245dc41f38cf8f95155975c7444a21a`
BLAKE2b-256	`b035ba0b257e694351f6388ed655679e94365e44ee485750dfa6f53ce0d8d754`

See more details on using hashes here.

File details

Details for the file femr_oldcpu-0.1.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: femr_oldcpu-0.1.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 8, 2023
Size: 1.1 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for femr_oldcpu-0.1.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`f20284f576498b4f3414dd390c78d9b82356bc71f0d986f40918ec2dc0d66a52`
MD5	`40e31701c445c2ffb0eeb23039fd391e`
BLAKE2b-256	`30905666ab3dd8e5066ff72a6031369c2dee867f053d46579ad00650818dd252`

See more details on using hashes here.

File details

Details for the file femr_oldcpu-0.1.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: femr_oldcpu-0.1.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jul 8, 2023
Size: 1.1 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for femr_oldcpu-0.1.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`952cc666423b9859b526bb8bcfcec9bc71ca254e5d5e9cdc7e5cc5b2690d6503`
MD5	`92465f1f641de784e4d6e8a1b4d63dcb`
BLAKE2b-256	`54e06c131e28c02f3e73343f36360bf316fc09f4ebd0821f718b3a24503364a9`

See more details on using hashes here.

femr-oldcpu 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

FEMR

Framework for Electronic Medical Records

Installation

How to install FEMR without CUDA

How to install FEMR with CUDA support

Getting Started

Development

Building from source

Special note for NERO users

(Optional) Installing CUDA on Nero / Carina

Precommit checks

Installation

Running

Test Functions

Formatting Checks

Miscellaneous

GZIP decompression commands

Zstandard compression commands

Generating extract

(Optional) Installing PyTorch

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes