An ETL pipeline to extract the eICU dataset into the MEDS format.
Project description
eICU MEDS Extraction ETL
A template repository for a MEDS-Transforms powered extraction pipeline for a custom dataset. Once you have customized the repository to your dataset (see instructions below), you will be able to run your extraction pipeline with a few simple command-line commands, such as:
pip install -e . # using editing mode
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
MEDS_extract-eICU root_output_dir=data/eicu_meds do_download=False
MEDS-transforms settings
If you want to convert a large dataset, you can use parallelization with MEDS-transforms (the MEDS-transformation step that takes the longest).
Using local parallelization with the hydra-joblib-launcher package, you can set the number of workers:
pip install hydra-joblib-launcher --upgrade
Then, you can set the number of workers as environment variable:
export N_WORKERS=8
Moreover, you can set the number of subjects per shard to balance the parallelization overhead based on how many subjects you have in your dataset:
export N_SUBJECTS_PER_SHARD=100000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eicu_meds-0.0.1.tar.gz.
File metadata
- Download URL: eicu_meds-0.0.1.tar.gz
- Upload date:
- Size: 131.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d6ba745fa0fe9a05f7ea812e3c0c08dd21c45ea0bb0cfe870b5e23098cf2243
|
|
| MD5 |
547eb1dda5d73b2ce193e9c2b4e9cd1b
|
|
| BLAKE2b-256 |
8bdfc9dd9a648622a48a657280d6d2730ff579bffec1389e6a0166cec6ce09fe
|
Provenance
The following attestation bundles were made for eicu_meds-0.0.1.tar.gz:
Publisher:
python-build.yaml on Medical-Event-Data-Standard/eICU_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eicu_meds-0.0.1.tar.gz -
Subject digest:
9d6ba745fa0fe9a05f7ea812e3c0c08dd21c45ea0bb0cfe870b5e23098cf2243 - Sigstore transparency entry: 613920291
- Sigstore integration time:
-
Permalink:
Medical-Event-Data-Standard/eICU_MEDS@3d1d357ec977d59d6b95baa886a1dfda77044935 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/Medical-Event-Data-Standard
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@3d1d357ec977d59d6b95baa886a1dfda77044935 -
Trigger Event:
push
-
Statement type:
File details
Details for the file eicu_meds-0.0.1-py3-none-any.whl.
File metadata
- Download URL: eicu_meds-0.0.1-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f86e46524a4aab471112a8eb0209b5c6312279af1b3f6ee250856cf536356bf5
|
|
| MD5 |
d4761dd1d69dfd3a2547596e77ebf146
|
|
| BLAKE2b-256 |
694027f4eb9efedf92cfd94b09e4b1cc3a1076cf1b08f15c6851473065825ee4
|
Provenance
The following attestation bundles were made for eicu_meds-0.0.1-py3-none-any.whl:
Publisher:
python-build.yaml on Medical-Event-Data-Standard/eICU_MEDS
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eicu_meds-0.0.1-py3-none-any.whl -
Subject digest:
f86e46524a4aab471112a8eb0209b5c6312279af1b3f6ee250856cf536356bf5 - Sigstore transparency entry: 613920313
- Sigstore integration time:
-
Permalink:
Medical-Event-Data-Standard/eICU_MEDS@3d1d357ec977d59d6b95baa886a1dfda77044935 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/Medical-Event-Data-Standard
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@3d1d357ec977d59d6b95baa886a1dfda77044935 -
Trigger Event:
push
-
Statement type: