Skip to main content

Framework for Electronic Medical Records. A python package for building models using EHR data.

Project description

FEMR

Framework for Electronic Medical Records

FEMR is a Python package for manipulating longitudinal EHR data for machine learning, with a focus on supporting the creation of foundation models and verifying their presumed benefits in healthcare. Such a framework is needed given the current state of large language models in healthcare and the need for better evaluation frameworks.

The currently supported foundation models are CLMBR and MOTOR.

FEMR by default supports the OMOP Common Data Model developed by the OHDSI community, but can also be used with other forms of EHR / claims data with minimal processing. FEMR has been used to process data from a variety of sources, including MIMIC-IV, Optum, Truven, STARR-OMOP, and SickKids-OMOP.

FEMR helps users:

  1. Manipulate events in the EHR data comprising a patient's timeline
  2. Algorithmically label patient records based on structured data
  3. Generate tabular features from patient timelines for use with traditional gradient boosted tree models
  4. Train and finetune CLMBR-derived models for binary classification and prediction tasks.
  5. Train and finetune MOTOR-derived models for making time-to-event predictions.

We recommend users start with our tutorial folder

Installation

There are two variants of the FEMR package, a CPU only version and a CUDA enabled version.

How to install FEMR without CUDA

pip install femr

If you have a particularly old CPU, we offer a variant of femr without CPU optimations.

pip install femr_oldcpu

How to install FEMR with CUDA support

Note that CUDA-enabled FEMR requires jax in order to function.

pip install --upgrade "jax[cuda11_pip]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install "femr_cuda[models]"

Getting Started

The first step of using FEMR is to convert your patient data into a femr.datasets.PatientDatabase, the standard file format used by the FEMR codebase to hold and query patient timelines.

There are two recommended paths for doing this, each with a corresponding tutorial:

  1. Convert your data to OMOP format, and then run the OMOP converter
  2. Convert your data to FEMR's simple csv format, and then run the simple FEMR converter

The simple csv route has an advantage of being an easier ETL, but does come with some downsides. See the below table for what features we support with what ETLs.

Capability OMOP -> FEMR Simple CSV -> FEMR
Core Labeling Tools :white_check_mark: :white_check_mark:
OMOP Specific Labeling :white_check_mark: :x:
Tabular Feature Generation :white_check_mark: :white_check_mark:
Foundation Model Training :white_check_mark: :white_check_mark:
Shared Vocabulary Enabling Cross-Site Foundation Model Transfer :white_check_mark: :x:

Development

The following guides are for developers who want to contribute to FEMR.

Building from source

In some scenarios (such as contributing to FEMR), you might want to compile the package from source.

In order to do so, follow the following instructions.

conda create -n FEMR_ENV python=3.10 bazel=6 -c conda-forge -y
conda activate FEMR_ENV

export BAZEL_USE_CPP_ONLY_TOOLCHAIN=1

git clone https://github.com/som-shahlab/femr.git
cd femr
pip install -e .

Special note for NERO users

As Nero does not have internet access, you must run the following before running the code above.

export DISTDIR=/local-scratch/nigam/distdir

(Optional) Installing CUDA on Nero / Carina

As a side note for Nero/Carina users, do not use your home directory to save the femr repo and installation files due to limited storage. We recommend using the shared project folder, e.g., on nero, use '/local-scratch/nigam/project/...'

If you are using Nero, you will need to install CUDA manually until the CUDA version on Nero is updated. To do so, follow these steps:

  1. Download version 11.8 of CUDA onto your local machine from here

  2. Copy your CUDA download from your local machine onto Nero, into whatever folder you'd like. We'll refer to the path to this folder as <PATH_TO_CUDA_INSTALLER> from now on.

    • Note: Nero doesn't work with scp. You can use an alternative like pscp, which functions basically identically to scp. You can install pscp on a Mac by using brew install putty.
  3. ssh into Nero using ssh <username>@nero-nigam.compute.stanford.edu

  4. On Nero, run the CUDA installer as a bash command as follows: bash <PATH_TO_CUDA_INSTALLER> --installpath=<INSTALL_PATH>, where <PATH_TO_CUDA_INSTALLER> is the path to the file you downloaded/transferred in Step #2, and <INSTALL_PATH> is where you'd like to save your CUDA installation files. We recommend using ~ or something similar.

  5. The CUDA installer will pop-up a window during installation. Uncheck all of the boxes it presents except for the box labeled "cuda toolkit".

  6. After the installation completes, the installer will print out two paths to your console. Take note of these paths, and copy them into your .bashrc file by running the following commands.

  7. Install cuDNN v8.7.0 (November 28th, 2022) for CUDA. Go to this link and download the file Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x -> Local Installer for Linux x86_64 (Tar) on your local computer and transfer it over to your local folder in nero. Then follow the instruction here section 1.3. Note that you need to copy over cudnn files to your local cuda. For example,

  • cp cudnn-*-archive/include/cudnn*.h <path_to_your_cuda>/include
  • cp -P cudnn-*-archive/lib/libcudnn* <path_to_your_cuda>/lib64
  • chmod a+r <path_to_your_cuda>/include/cudnn*.h <path_to_your_cuda>/lib64/libcudnn*
  1. Add the following to your .bashrc file. You may need to restart your terminal for the changes to be reflected.
export PATH="<INSTALL_PATH>/bin:$PATH"
export LD_LIBRARY_PATH="<INSTALL_PATH>/lib64:$LD_LIBRARY_PATH"

To write in a .bashrc file, use

nano ~/.bashrc
  1. Run rm /tmp/cuda-installer.log to remove the installer log (if you don't do this, it will cause a segmentation fault for other users when they try to install CUDA).

Precommit checks

Before committing, please run the following commands to ensure that your code is formatted correctly and passes all tests.

Installation

conda install pre-commit pytest -y
pre-commit install

Running

Test Functions

pytest tests

Formatting Checks

pre-commit run --all-files

Miscellaneous

GZIP decompression commands

export OMOP_SOURCE=/share/pi/nigam...
gunzip $OMOP_SOURCE/**/*.csv.gz

Zstandard compression commands

export OMOP_SOURCE=/share/pi/nigam...
zstd -1 --rm $OMOP_SOURCE/**/*.csv

Generating extract

# Set up environment variables
#   Path to a folder containing your raw STARR-OMOP download, generated via `tools.stanford.download_bigquery.py`
export OMOP_SOURCE=/path/to/omop/folder...
#   Path to any arbitrary folder where you want to store your FEMR extract
export EXTRACT_DESTINATION=/path/to/femr/extract/folder...
#   Path to any arbitrary folder where you want to store your FEMR extract logs
export EXTRACT_LOGS=/path/to/femr/extract/logs...

# Do some data preprocessing with Stanford-specific helper scripts
#   Extract data from flowsheets
python tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"
#   Normalize visits
python tools/omop/normalize_visit_detail.py --num_threads 5 "${EXTRACT_DESTINATION}_flowsheets" "${EXTRACT_DESTINATION}_flowsheets_detail"

# Run actual FEMR extraction
etl_stanford_omop "${EXTRACT_DESTINATION}_flowsheets_detail" $EXTRACT_DESTINATION $EXTRACT_LOGS --num_threads 10

Example usage (Note: This should take ~10 minutes on a 1% extract of STARR-OMOP)

export OMOP_SOURCE=/local-scratch/nigam/projects/ethanid/som-rit-phi-starr-prod.starr_omop_cdm5_deid_1pcent_2022_11_09
export EXTRACT_DESTINATION=/local-scratch/nigam/projects/mwornow/femr_starr_omop_cdm5_deid_1pcent_2022_11_09
export EXTRACT_LOGS=/local-scratch/nigam/projects/mwornow/femr_starr_omop_cdm5_deid_1pcent_2022_11_09_logs

python tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"
python tools/omop/normalize_visit_detail.py --num_threads 5 "${EXTRACT_DESTINATION}_flowsheets" "${EXTRACT_DESTINATION}_flowsheets_detail"

etl_stanford_omop "${EXTRACT_DESTINATION}_flowsheets_detail" $EXTRACT_DESTINATION $EXTRACT_LOGS --num_threads 10

(Optional) Installing PyTorch

If you are on Nero, you need to install PyTorch using:

conda install numpy -y
pip install torch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu111

If you are on Carina, you need to install PyTorch using:

conda install numpy pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia -y

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

femr-0.1.16.tar.gz (11.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

femr-0.1.16-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

femr-0.1.16-pp39-pypy39_pp73-macosx_10_14_x86_64.whl (1.1 MB view details)

Uploaded PyPymacOS 10.14+ x86-64

femr-0.1.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

femr-0.1.16-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

femr-0.1.16-cp311-cp311-macosx_10_14_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 10.14+ x86-64

femr-0.1.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

femr-0.1.16-cp310-cp310-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

femr-0.1.16-cp310-cp310-macosx_10_14_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10macOS 10.14+ x86-64

femr-0.1.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

femr-0.1.16-cp39-cp39-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

femr-0.1.16-cp39-cp39-macosx_10_14_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

File details

Details for the file femr-0.1.16.tar.gz.

File metadata

  • Download URL: femr-0.1.16.tar.gz
  • Upload date:
  • Size: 11.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for femr-0.1.16.tar.gz
Algorithm Hash digest
SHA256 717d8d5719b50cfe1cf195c8fa6ffd629707a766fcbfce211e9dd84e19d5f418
MD5 1999cc3898a8fb25717ce66884c6ebee
BLAKE2b-256 0d05326b5437af492c7637da46b318c035687c84fa7bb2641d17f8688aa13f06

See more details on using hashes here.

File details

Details for the file femr-0.1.16-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a99e8d76ca1caec2fea9f9d1cc002dd11da253fc5c16c9408d18ffdc2a9c3d1b
MD5 c4ab1b40296ad256bef3c09ac68180b0
BLAKE2b-256 df4611a9025d99b8adfc4efc3e99c52f74736c86c3ca46a6b2f3e4e83eb31d3e

See more details on using hashes here.

File details

Details for the file femr-0.1.16-pp39-pypy39_pp73-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-pp39-pypy39_pp73-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 28fa99311f23800ab5b0e6ce0ddde1cbb04ca1cdb5a84481919e1ced6fd79dd3
MD5 fb74f12b63a674c69991b4d59cfbc186
BLAKE2b-256 0393c17e90e82ed1a05f2cabddb60ba5dd8ae4c2577ec2b06d2b89451cf64831

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e296e746d1f4f063e426fe54e512fb31a3500a83f9d0be14c226c36fb6b3936
MD5 c5c4ee53e050510fd2af5c0dada1172a
BLAKE2b-256 fe0293c68264a48b4b0d87fd436f4a0a94be89ec782c5e08bbbafe96e0e35a4e

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a52726dc17fd88e3069968ad8202b12b18d4cedacb85d4960def4045410d552
MD5 faa2205d0ea6f8893a27169626387e7f
BLAKE2b-256 52ee789bd598e156c2be35e276691bb1f4c27dc05a1cc484a61c49c9d3f2ae97

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp311-cp311-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp311-cp311-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 75a3bbd43805ec168c53f407563386bc951dd069501a82c6eda4c4942a1ab948
MD5 24602791471bad8783c0d37e270e2730
BLAKE2b-256 8a2a77feee3d49cdb939d4e601c91ffb0df2a15ce0f5c8c37881d741f33793f3

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a7b4dea3e67d345dc88733163e8cb07f4f2e7ab0a3a9f6a42a32d4e73480fa2
MD5 66d40a82518b5838387e8a70024e8513
BLAKE2b-256 bb927aa683fcb4e12d0ae6e50c183c82dbd33d147803428f5d32e9b427665845

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 41509bc01971132480a801a2df6da7800f3be56b794f39296a75b486bfc59694
MD5 be113353f856fd7fe85c1ca76d2bdfda
BLAKE2b-256 310486fc349a53e5fd5a4aa4075b02a9fbd1c4fe840bd65ae9fb4681399af4ed

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2253ba027efd2b9028787d376402485b75e4f9bd5ab8565b255d0985bcc457ed
MD5 4fe6494e5e616ccdfa9833f6b6b496ab
BLAKE2b-256 e4ce37c7a01455ab08ce89163721a3d99125f65f45459cb0664c9731292ceb58

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d0bdf05925eb991935c8dd4e5df3b5a7ceb704d2716fc49cc8c8bf982bc27621
MD5 1d0d30bd883666a56cf279ddfc7305ae
BLAKE2b-256 ba9d9f0a835831e3142fbc560c4998c9d8f8887d57c6d6020fefa26269ca1b2b

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 95da72b06d4f1ce24690bc02a3590261861232764b26e44dccfce533f8bf6173
MD5 f983dddfd241bd1a5b748e200360f94a
BLAKE2b-256 95cf5e35ef446a064b173003a365d55720337994eaab7e2c6f20089f68b51e55

See more details on using hashes here.

File details

Details for the file femr-0.1.16-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for femr-0.1.16-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 3039d1ff472a3e9abeaf147ed7fa0e82c6a954ae882d7cf5c06b9e983e27e07a
MD5 cc18cac1ba02a260f18a47d6b7ee2922
BLAKE2b-256 35bc79a09d978b815cb407981543497ee977a4c553cbbcd290258e1878eff682

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page