An ETL to convert OMOP data to the MEDS format.
Project description
MEDS OMOP ETL with MEDS-Transforms
An ETL pipeline for transforming OMOP datasets into the MEDS format using the MEDS-Transforms library. Thanks to the developers of the first OMOP MEDS ETL, from which we took inspiration, which can be found here: https://github.com/Medical-Event-Data-Standard/meds_etl. We currently support OMOP 5.3 and 5.4 datasets.
pip install OMOP_MEDS
OMOP_MEDS root_output_dir=$ROOT_OUTPUT_DIR
To try with the MIMIC-IV OMOP demo dataset, you can run:
OMOP_MEDS root_output_dir=/path/to/your/output do_download=True ++do_demo=True
Example config for an OMOP dataset:
dataset_name: MIMIC_IV_OMOP
raw_dataset_version: 1.0
omop_version: 5.3
urls:
dataset:
- https://physionet.org/content/mimic-iv-demo-omop/0.9/
- url: EXAMPLE_CONTROLLED_URL
username: ${oc.env:DATASET_DOWNLOAD_USERNAME}
password: ${oc.env:DATASET_DOWNLOAD_PASSWORD}
demo:
- https://physionet.org/content/mimic-iv-demo-omop/0.9/
common:
- EXAMPLE_SHARED_URL # Often used for shared metadata files
Pre-MEDS settings
The following settings can be used to configure the pre-MEDS steps.
OMOP_MEDS \
root_output_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/meds_debug/small_demo \
raw_input_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/full_omop \
do_download=False ++do_overwrite=True ++limit_subjects=50
root_output_dir: Set the root output directory.raw_input_dir: Path to the raw input directory.do_download: Set toFalseto skip downloading the dataset.++do_overwrite: Set toTrueto overwrite existing files.++limit_subjects: Limit the number of subjects to process.
MEDS-transforms settings
If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).
Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:
pip install hydra-joblib-launcher --upgrade
Then, you can set the number of workers as environment variable:
export N_WORKERS=16
Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:
export N_SUBJECTS_PER_SHARD=1000
Citation
If you use this dataset, please use the citation link in Github.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omop_meds-0.0.10.tar.gz.
File metadata
- Download URL: omop_meds-0.0.10.tar.gz
- Upload date:
- Size: 906.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e722fe40c00c85fef12f733ad2e068679b6d39d9b78adb605872c629a0e68a72
|
|
| MD5 |
a08f844e3a60c200a80bdb0a5ddf86b1
|
|
| BLAKE2b-256 |
abd186d7eaf848b2ff8a30724d8edb3887acbfb90e5851458c31d3a5850e8413
|
Provenance
The following attestation bundles were made for omop_meds-0.0.10.tar.gz:
Publisher:
python-build.yaml on rvandewater/OMOP_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_meds-0.0.10.tar.gz -
Subject digest:
e722fe40c00c85fef12f733ad2e068679b6d39d9b78adb605872c629a0e68a72 - Sigstore transparency entry: 196725125
- Sigstore integration time:
-
Permalink:
rvandewater/OMOP_MEDS@11ebe2050785d4daa6d69a826e829829d896210e -
Branch / Tag:
refs/tags/0.0.10 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@11ebe2050785d4daa6d69a826e829829d896210e -
Trigger Event:
push
-
Statement type:
File details
Details for the file omop_meds-0.0.10-py3-none-any.whl.
File metadata
- Download URL: omop_meds-0.0.10-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1574744780b8ec37cdf21a6c5fd713c7696001f3c0816df1072cfcd30576574f
|
|
| MD5 |
570b8526d898032ba270883401886455
|
|
| BLAKE2b-256 |
3cca6dafe5262742488692ce94c3c1bfb7cfa6e6aa972b1a3ee05bdf3d47ee7a
|
Provenance
The following attestation bundles were made for omop_meds-0.0.10-py3-none-any.whl:
Publisher:
python-build.yaml on rvandewater/OMOP_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_meds-0.0.10-py3-none-any.whl -
Subject digest:
1574744780b8ec37cdf21a6c5fd713c7696001f3c0816df1072cfcd30576574f - Sigstore transparency entry: 196725130
- Sigstore integration time:
-
Permalink:
rvandewater/OMOP_MEDS@11ebe2050785d4daa6d69a826e829829d896210e -
Branch / Tag:
refs/tags/0.0.10 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@11ebe2050785d4daa6d69a826e829829d896210e -
Trigger Event:
push
-
Statement type: