Skip to main content

An ETL pipeline to extract INSPIRE data into the MEDS format.

Project description

INSPIRE-MEDS

PyPI - Version codecov tests code-quality python Static Badge license PRs contributors DOI

The INSPIRE dataset is a publicly available research dataset in perioperative medicine, which includes approximately 130,000 cases (50% of all surgical cases) who underwent anesthesia for surgery at an academic institution in South Korea between 2011 and 2020. This comprehensive dataset includes patient characteristics such as age, sex, American Society of Anesthesiologists physical status classification, diagnosis, surgical procedure code, department, and type of anesthesia. It also includes vital signs in the operating theatre, general wards, and intensive care units (ICUs), laboratory results from six months before admission to six months after discharge, and medication during hospitalization. Complications include total hospital and ICU length of stay and in-hospital death. This pipeline extracts the INSPIRE dataset (from physionet, https://physionet.org/content/inspire/) into the MEDS format.

Usage:

pip install INSPIRE_MEDS
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-INSPIRE root_output_dir=$ROOT_OUTPUT_DIR

When you run this, the program will:

  1. Download the needed raw INSPIRE files for the currently supported version into $ROOT_OUTPUT_DIR/raw_input.
  2. Perform initial, pre-MEDS processing on the raw INSPIRE files, saving the results in $ROOT_OUTPUT_DIR/pre_MEDS.
  3. Construct the final MEDS cohort, and save it to $ROOT_OUTPUT_DIR/MEDS_cohort.

You can also specify the target directories more directly, with

export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-INSPIRE raw_input_dir=$RAW_INPUT_DIR pre_MEDS_dir=$PRE_MEDS_DIR MEDS_cohort_dir=$MEDS_COHORT_DIR

Examples and More Info:

You can run MEDS_extract-INSPIRE --help for more information on the arguments and options. You can also run

MEDS_extract-INSPIRE root_output_dir=$ROOT_OUTPUT_DIR

to run the entire pipeline.

MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).

Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:

pip install hydra-joblib-launcher --upgrade

Then, you can set the number of workers as environment variable:

export N_WORKERS=8

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:

export N_SUBJECTS_PER_SHARD=100000

The MIMIC-IV OMOP Dataset

We use the demo dataset for MIMIC-IV in the OMOP format, which is a subset of the MIMIC-IV dataset. This dataset downloaded from Physionet does not include the standard dictionary linking definitions but should otherwise be functional

Particularities

  • Care site is added to the visit as text
  • Add support for care_site table (visit_detail)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspire_meds-0.0.13.tar.gz (133.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inspire_meds-0.0.13-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file inspire_meds-0.0.13.tar.gz.

File metadata

  • Download URL: inspire_meds-0.0.13.tar.gz
  • Upload date:
  • Size: 133.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inspire_meds-0.0.13.tar.gz
Algorithm Hash digest
SHA256 2de760f5014ff8732924df6abd6e02827122758b321ce70e868aad590214d59f
MD5 24d71cb4e566c813fac1d6bad3467da6
BLAKE2b-256 d8a756e182db21744ef2de597b6b6e4d622da23523fa5d159f7633888bd0d0c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for inspire_meds-0.0.13.tar.gz:

Publisher: python-build.yaml on rvandewater/INSPIRE_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file inspire_meds-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: inspire_meds-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inspire_meds-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 01417b4ca62103b4d93031b2ea7ed956551fed125d63581577f0464ba1e583ea
MD5 b09fc49f84b9911306e81fd8ceb5c586
BLAKE2b-256 05621fedd8bd618aff2e0269811bc53414f8a171c1deaeda0c1d23ceeb1b49fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for inspire_meds-0.0.13-py3-none-any.whl:

Publisher: python-build.yaml on rvandewater/INSPIRE_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page