An ETL pipeline to extract INSPIRE data into the MEDS format.
Project description
INSPIRE-MEDS
The INSPIRE dataset is a publicly available research dataset in perioperative medicine, which includes approximately 130,000 cases (50% of all surgical cases) who underwent anesthesia for surgery at an academic institution in South Korea between 2011 and 2020. This comprehensive dataset includes patient characteristics such as age, sex, American Society of Anesthesiologists physical status classification, diagnosis, surgical procedure code, department, and type of anesthesia. It also includes vital signs in the operating theatre, general wards, and intensive care units (ICUs), laboratory results from six months before admission to six months after discharge, and medication during hospitalization. Complications include total hospital and ICU length of stay and in-hospital death. This pipeline extracts the INSPIRE dataset (from physionet, https://physionet.org/content/inspire/) into the MEDS format.
Usage:
pip install INSPIRE_MEDS
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-INSPIRE root_output_dir=$ROOT_OUTPUT_DIR
When you run this, the program will:
- Download the needed raw INSPIRE files for the currently supported version into
$ROOT_OUTPUT_DIR/raw_input. - Perform initial, pre-MEDS processing on the raw INSPIRE files, saving the results in
$ROOT_OUTPUT_DIR/pre_MEDS. - Construct the final MEDS cohort, and save it to
$ROOT_OUTPUT_DIR/MEDS_cohort.
You can also specify the target directories more directly, with
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-INSPIRE raw_input_dir=$RAW_INPUT_DIR pre_MEDS_dir=$PRE_MEDS_DIR MEDS_cohort_dir=$MEDS_COHORT_DIR
Examples and More Info:
You can run MEDS_extract-INSPIRE --help for more information on the arguments and options. You can also run
MEDS_extract-INSPIRE root_output_dir=$ROOT_OUTPUT_DIR
to run the entire pipeline.
MEDS-transforms settings
If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).
Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:
pip install hydra-joblib-launcher --upgrade
Then, you can set the number of workers as environment variable:
export N_WORKERS=8
Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:
export N_SUBJECTS_PER_SHARD=100000
The MIMIC-IV OMOP Dataset
We use the demo dataset for MIMIC-IV in the OMOP format, which is a subset of the MIMIC-IV dataset. This dataset downloaded from Physionet does not include the standard dictionary linking definitions but should otherwise be functional
Particularities
- Care site is added to the visit as text
- Add support for care_site table (visit_detail)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inspire_meds-0.0.13.tar.gz.
File metadata
- Download URL: inspire_meds-0.0.13.tar.gz
- Upload date:
- Size: 133.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2de760f5014ff8732924df6abd6e02827122758b321ce70e868aad590214d59f
|
|
| MD5 |
24d71cb4e566c813fac1d6bad3467da6
|
|
| BLAKE2b-256 |
d8a756e182db21744ef2de597b6b6e4d622da23523fa5d159f7633888bd0d0c7
|
Provenance
The following attestation bundles were made for inspire_meds-0.0.13.tar.gz:
Publisher:
python-build.yaml on rvandewater/INSPIRE_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inspire_meds-0.0.13.tar.gz -
Subject digest:
2de760f5014ff8732924df6abd6e02827122758b321ce70e868aad590214d59f - Sigstore transparency entry: 658564203
- Sigstore integration time:
-
Permalink:
rvandewater/INSPIRE_MEDS@d870204a047c49b43bc8c23a593bb0368e731d31 -
Branch / Tag:
refs/tags/0.0.13 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@d870204a047c49b43bc8c23a593bb0368e731d31 -
Trigger Event:
push
-
Statement type:
File details
Details for the file inspire_meds-0.0.13-py3-none-any.whl.
File metadata
- Download URL: inspire_meds-0.0.13-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01417b4ca62103b4d93031b2ea7ed956551fed125d63581577f0464ba1e583ea
|
|
| MD5 |
b09fc49f84b9911306e81fd8ceb5c586
|
|
| BLAKE2b-256 |
05621fedd8bd618aff2e0269811bc53414f8a171c1deaeda0c1d23ceeb1b49fe
|
Provenance
The following attestation bundles were made for inspire_meds-0.0.13-py3-none-any.whl:
Publisher:
python-build.yaml on rvandewater/INSPIRE_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
inspire_meds-0.0.13-py3-none-any.whl -
Subject digest:
01417b4ca62103b4d93031b2ea7ed956551fed125d63581577f0464ba1e583ea - Sigstore transparency entry: 658564220
- Sigstore integration time:
-
Permalink:
rvandewater/INSPIRE_MEDS@d870204a047c49b43bc8c23a593bb0368e731d31 -
Branch / Tag:
refs/tags/0.0.13 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@d870204a047c49b43bc8c23a593bb0368e731d31 -
Trigger Event:
push
-
Statement type: