Skip to main content

An ETL pipeline to extract the eICU dataset into the MEDS format.

Project description

eICU MEDS Extraction ETL

PyPI - Version Documentation Status Static Badge

codecov tests code-quality python license PRs contributors

A template repository for a MEDS-Transforms powered extraction pipeline for a custom dataset. Once you have customized the repository to your dataset (see instructions below), you will be able to run your extraction pipeline with a few simple command-line commands, such as:

pip install -e . # using editing mode
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-eICU root_output_dir=data/eicu_meds do_download=False

MEDS-transforms settings

If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).

Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:

pip install hydra-joblib-launcher --upgrade

Then, you can set the number of workers as environment variable:

export N_WORKERS=8

Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:

export N_SUBJECTS_PER_SHARD=100000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eicu_meds-0.0.1.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eicu_meds-0.0.1-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file eicu_meds-0.0.1.tar.gz.

File metadata

  • Download URL: eicu_meds-0.0.1.tar.gz
  • Upload date:
  • Size: 131.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eicu_meds-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9d6ba745fa0fe9a05f7ea812e3c0c08dd21c45ea0bb0cfe870b5e23098cf2243
MD5 547eb1dda5d73b2ce193e9c2b4e9cd1b
BLAKE2b-256 8bdfc9dd9a648622a48a657280d6d2730ff579bffec1389e6a0166cec6ce09fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for eicu_meds-0.0.1.tar.gz:

Publisher: python-build.yaml on Medical-Event-Data-Standard/eICU_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eicu_meds-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: eicu_meds-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for eicu_meds-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f86e46524a4aab471112a8eb0209b5c6312279af1b3f6ee250856cf536356bf5
MD5 d4761dd1d69dfd3a2547596e77ebf146
BLAKE2b-256 694027f4eb9efedf92cfd94b09e4b1cc3a1076cf1b08f15c6851473065825ee4

See more details on using hashes here.

Provenance

The following attestation bundles were made for eicu_meds-0.0.1-py3-none-any.whl:

Publisher: python-build.yaml on Medical-Event-Data-Standard/eICU_MEDS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page