Skip to main content

An ETL pipeline to extract MIMIC-IV data into the MEDS format.

Project description

MIMIC-IV MEDS Extraction ETL

PyPI - Version codecov tests code-quality python license PRs contributors DOI

This pipeline extracts the MIMIC-IV dataset (from physionet) into the MEDS format.

Usage:

pip install MIMIC_IV_MEDS
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-MIMIC_IV root_output_dir=$ROOT_OUTPUT_DIR

When you run this, the program will:

  1. Download the needed raw MIMIC files for the currently supported version into $ROOT_OUTPUT_DIR/raw_input.
  2. Perform initial, pre-MEDS processing on the raw MIMIC files, saving the results in $ROOT_OUTPUT_DIR/pre_MEDS.
  3. Construct the final MEDS cohort, and save it to $ROOT_OUTPUT_DIR/MEDS_cohort.

You can also specify the target directories more directly, with

export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-MIMIC_IV raw_input_dir=$RAW_INPUT_DIR pre_MEDS_dir=$PRE_MEDS_DIR MEDS_cohort_dir=$MEDS_COHORT_DIR

Examples and More Info:

You can run MEDS_extract-MIMIC_IV --help for more information on the arguments and options. You can also run

MEDS_extract-MIMIC_IV root_output_dir=$ROOT_OUTPUT_DIR do_demo=True

to run the entire pipeline over the publicly available, fully open MIMIC-IV demo dataset.

Expected runtime and compute needs

This pipeline can be successfully run over the full MIMIC-IV on a 5-core machine leveraging around 165GB of memory in approximately 7 hours (note this time includes the time to download all of the MIMIC-IV files as well, and this test was run on a machine with poor network transfer speeds and without any parallelization applied to the transformation steps, so these speeds can likely be greatly increased). The output folder of data is 9.8 GB. This can be reduced significantly as well as intermediate files not necessary for the final MEDS dataset are retained in additional folders. See this github issue for tracking on ensuring these directories are automatically cleaned up in the future.

📚 Citing this work

If you use this software in your research, please cite it! You can use the "Cite this repository" button on GitHub.

The citation information is maintained in the CITATION.cff file in this repository.

🔧 Common Issues / FAQ

❓ Issue: FileNotFoundError or pipeline errors during the pre_MEDS step on Ubuntu (symlinks not recognized)

Problem:

Some users running the pipeline encounter errors during the pre_MEDS step, where the scripts attempt to create symlinks but later fails to recognize or access them — even though the symlinks appear to exist in the file system.

Solution:

A do_copy=True option is available in the CLI that allows the pipeline to copy files instead of symlinking, avoiding this issue entirely (at the cost of additional disk usage). You can enable this by adding do_copy=True to your command:

MEDS_extract-MIMIC_IV root_output_dir=$ROOT_OUTPUT_DIR do_copy=True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mimic_iv_meds-0.1.2.tar.gz (99.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mimic_iv_meds-0.1.2-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file mimic_iv_meds-0.1.2.tar.gz.

File metadata

  • Download URL: mimic_iv_meds-0.1.2.tar.gz
  • Upload date:
  • Size: 99.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mimic_iv_meds-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0c01240cd2c95de4c4489df02c4c9dc7361ad7780952c1c07cfe93b533d8f781
MD5 9eb78afaa5b4dc8a0613f23d8a47d39a
BLAKE2b-256 3563ed87c19e42cd4d65951fa5cd30f31893728da030e95940bbe5539baecf48

See more details on using hashes here.

Provenance

The following attestation bundles were made for mimic_iv_meds-0.1.2.tar.gz:

Publisher: python-build.yaml on Medical-Event-Data-Standard/MIMIC_IV_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mimic_iv_meds-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mimic_iv_meds-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mimic_iv_meds-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cac21c135430400b30033c9b92d42b537662336f8755e6eb920ad66cb0e5bd52
MD5 b89efd563246476665ad43abf86309a5
BLAKE2b-256 76f8cb7f101d7b6e4cd6f3631d7df3e1a5957ac8ed50db1d5cce7615c2fc170b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mimic_iv_meds-0.1.2-py3-none-any.whl:

Publisher: python-build.yaml on Medical-Event-Data-Standard/MIMIC_IV_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page