Skip to main content

A template ETL pipeline to extract arbitrary data into the MEDS format.

Project description

SICdb_MEDS ETL

PyPI - Version Documentation Status codecov tests code-quality python Static Badge license PRs contributors DOI

The SICdb dataset offers insights into over 27 thousand intensive care admissions, including therapies and data on preceding surgeries. Data were collected between 2013 and 2021 from four different intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 41 beds. The dataset is deidentified and contains, amongst others, case information, vital signs, laboratory results and medication data. SICdb provides both aggregated once-per-hour and highly granular once-per-minute data, making it suitable for computational and machine learning-based research. (source: https://www.sicdb.com/Documentation/Main_Page)

Usage

pip install SICdb_MEDS # you can do this locally or via PyPI
# Download your data or set download credentials
MEDS_extract-SICdb root_output_dir=$ROOT_OUTPUT_DIR

# or, if you have the data already downloaded
MEDS_extract-SICdb root_output_dir=$ROOT_OUTPUT_DIR do_download=False

# or, if you want enable waveform extraction and processing (takes significantly longer and up to 100GB of RAM)
MEDS_extract-SICdb root_output_dir=$ROOT_OUTPUT_DIR do_process_waveform=True

MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).

Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:

pip install hydra-joblib-launcher --upgrade

Then, you can set the number of workers as environment variable:

export N_WORKERS=8

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:

export N_SUBJECTS_PER_SHARD=100000

The MIMIC-IV OMOP Dataset

We use the demo dataset for MIMIC-IV in the OMOP format, which is a subset of the MIMIC-IV dataset. This dataset downloaded from Physionet does not include the standard dictionary linking definitions but should otherwise be functional

Particularities

  • Care site is added to the visit as text
  • Add support for care_site table (visit_detail)

Citation

If you use this dataset, please cite the original publication below and the ETL (see cite this repository):


@article{rodemundHarnessingBigData2024,
title = {Harnessing {Big} {Data} in {Critical} {Care}: {Exploring} a new {European} {Dataset}},
volume = {11},
copyright = {2024 The Author(s)},
issn = {2052-4463},
shorttitle = {Harnessing {Big} {Data} in {Critical} {Care}},
url = {https://www.nature.com/articles/s41597-024-03164-9},
doi = {10.1038/s41597-024-03164-9},
language = {en},
number = {1},
urldate = {2024-04-04},
journal = {Scientific Data},
author = {Rodemund, Niklas and Wernly, Bernhard and Jung, Christian and Cozowicz, Crispiana and Koköfer, Andreas},
month = mar,
year = {2024},
note = {Publisher: Nature Publishing Group},
keywords = {Clinical trial design, Experimental models of disease},
pages = {320},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sicdb_meds-0.0.5.tar.gz (132.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sicdb_meds-0.0.5-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file sicdb_meds-0.0.5.tar.gz.

File metadata

  • Download URL: sicdb_meds-0.0.5.tar.gz
  • Upload date:
  • Size: 132.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sicdb_meds-0.0.5.tar.gz
Algorithm Hash digest
SHA256 88555e6d2615e7a6180b6325c06b9052786a876e93d7257d1a10d0a42d6c101c
MD5 4774ab997d098e57d17d7d4713c84dac
BLAKE2b-256 39ce00126b707506541da7914997e72b12b3f7b16b6cc3f86c841b22e1eb305c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sicdb_meds-0.0.5.tar.gz:

Publisher: python-build.yaml on rvandewater/SICdb_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sicdb_meds-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: sicdb_meds-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sicdb_meds-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1350855b597c4dcbe29a88d062db2744a0058009e88eafed343414968a722cf7
MD5 94ceec7e43aab20c3c0ccb9402721167
BLAKE2b-256 632fe54542d7976b3b5d5c2b41fbe9a67ca89ed3523377e215ca141b71bba7ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for sicdb_meds-0.0.5-py3-none-any.whl:

Publisher: python-build.yaml on rvandewater/SICdb_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page