Skip to main content

An ETL to convert OMOP data to the MEDS format.

Project description

MEDS OMOP ETL with MEDS-Transforms

PyPI - Version codecov tests code-quality python license PRs contributors DOI Static Badge

An ETL pipeline for transforming OMOP datasets into the MEDS format using the MEDS-Transforms library. Thanks to the developers of the first OMOP MEDS ETL, from which we took inspiration, which can be found here: https://github.com/Medical-Event-Data-Standard/meds_etl. We currently support OMOP 5.3 and 5.4 datasets.

pip install OMOP_MEDS
OMOP_MEDS root_output_dir=$ROOT_OUTPUT_DIR

To try with the MIMIC-IV OMOP demo dataset, you can run:

OMOP_MEDS root_output_dir=/path/to/your/output do_download=True ++do_demo=True

Example config for an OMOP dataset:

dataset_name: MIMIC_IV_OMOP
raw_dataset_version: 1.0
omop_version: 5.3

urls:
  dataset:
    - https://physionet.org/content/mimic-iv-demo-omop/0.9/
    - url: EXAMPLE_CONTROLLED_URL
      username: ${oc.env:DATASET_DOWNLOAD_USERNAME}
      password: ${oc.env:DATASET_DOWNLOAD_PASSWORD}
  demo:
    - https://physionet.org/content/mimic-iv-demo-omop/0.9/
  common:
    - EXAMPLE_SHARED_URL # Often used for shared metadata files

Pre-MEDS settings

The following settings can be used to configure the pre-MEDS steps.

OMOP_MEDS \
	root_output_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/meds_debug/small_demo \
	raw_input_dir=/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/full_omop \
	do_download=False ++do_overwrite=True ++limit_subjects=50
  • root_output_dir: Set the root output directory.
  • raw_input_dir: Path to the raw input directory.
  • do_download: Set to False to skip downloading the dataset.
  • ++do_overwrite: Set to True to overwrite existing files.
  • ++limit_subjects: Limit the number of subjects to process.

MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).

Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:

pip install hydra-joblib-launcher --upgrade

Then, you can set the number of workers as environment variable:

export N_WORKERS=16

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:

export N_SUBJECTS_PER_SHARD=1000

Citation

If you use this dataset, please use the citation link in Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omop_meds-0.0.10.tar.gz (906.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omop_meds-0.0.10-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file omop_meds-0.0.10.tar.gz.

File metadata

  • Download URL: omop_meds-0.0.10.tar.gz
  • Upload date:
  • Size: 906.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for omop_meds-0.0.10.tar.gz
Algorithm Hash digest
SHA256 e722fe40c00c85fef12f733ad2e068679b6d39d9b78adb605872c629a0e68a72
MD5 a08f844e3a60c200a80bdb0a5ddf86b1
BLAKE2b-256 abd186d7eaf848b2ff8a30724d8edb3887acbfb90e5851458c31d3a5850e8413

See more details on using hashes here.

Provenance

The following attestation bundles were made for omop_meds-0.0.10.tar.gz:

Publisher: python-build.yaml on rvandewater/OMOP_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omop_meds-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: omop_meds-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for omop_meds-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 1574744780b8ec37cdf21a6c5fd713c7696001f3c0816df1072cfcd30576574f
MD5 570b8526d898032ba270883401886455
BLAKE2b-256 3cca6dafe5262742488692ce94c3c1bfb7cfa6e6aa972b1a3ee05bdf3d47ee7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for omop_meds-0.0.10-py3-none-any.whl:

Publisher: python-build.yaml on rvandewater/OMOP_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page