A repo of clinical benchmarks for MIMIC-IV
A comprehensive set of clinical benchmarks
- (Recommended) Create the
conda env create -f environment.yml
- Pip install:
pip install clinical_benchmarks
Regenerate data files
If you'd like to regenerate the data files from the source datasets, you'll need to have a Google Cloud Platform (GCP) account, BigQuery dataset, and GCS storage bucket prepared. These are used to (1) create intermediate tables, (2) copy those tables out of BigQuery to GCS, and finally (3) download the tables locally.
As described by the Python Client for Google BigQuery, setup requires you to:
- Select or create a Cloud Platform project.
- Enable billing for your project.
- Enable the Google Cloud BigQuery API.
- Setup Authentication.
Once you have GCP project, have created a BigQuery dataset, and have created a GCP storage bucket, you can use environment variables to specify them when running the download script:
export GCP_PROJECT='MY-GCP-PROJECT' export BQ_DATASET='MY-BIGQUERY-DATASET' export GCS_BUCKET='MY-STORAGE-BUCKET' export MODEL_DIR='MY-SAVE-DIR' clinical_benchmarks download
If you are not a Linux/MacOS user Alternatively, you can specify the values at the command line:
clinical_benchmarks --csv_dir MY-SAVE-DIR download --project MY-GCP-PROJECT --dataset MY-BIGQUERY-DATASET --bucket MY-STORAGE-BUCKET
If you prefer to mannually regenerate the data files Check the data_pipeline.ipynb which prepared runnable cells to download data files
Create task dependent datasets
- Vancoymydin Dosing Prediction (Reinforcement Learning) - Heparin Dosing Prediction (Reinforcement Learning)
All tasks are designed as data processing class, inherited from the BaseDataProcessor class. Each task class has two methods, create_task_df and save_task_df. (detail see task.py)
Procedure of creating task datasets
(check data_pipeline.ipynb for runnable example)
- Modify the environment_config.env file with your own environment variables
- Import necessary packages and use dotenv to load environment_config.env
import dotenv env_file = 'path_to_environment.env' dotenv.load_dotenv(env_file, override=True)
- Choose a task, such as Vancomycin dosing prediction.
- Create the task object.
vanco = VancomycinDosingDataProcessor()
- Create task dataframe by calling method create_task_df. This method require two arguments, time_step and agg.
time_step determines the time interval (hourly based) between each state, and agg determines the aggregation method that will be used during dataframe merging process, such as "last".
time_step = 4 agg = 'last' vanco.create_task_df(time_step, agg) # featured_cohort_with_time is the outcome task_df display(vanco.featured_cohort_with_time)
- Save the task dataframe created in step 5 to a given directory by calling method save_task_df. This method require two arguments, csv_dir and filename.
csv_dir needs to be a Path object, which specify the directory you would like to save the task_df, and filename is a str that represent the output file name (the filename must has a .csv.gz extension).
csv_dir = Path('YOUR_SAVE_DIR') filename = 'vancomycin_dosing_task.csv.gz' # will save vanco.featured_cohort_with_time as a .csv.gz file vanco.save_task_df(save_dir, filename)
- Check your saving directory, and try it with your model!
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for clinical_benchmarks-1.0.0.tar.gz
Hashes for clinical_benchmarks-1.0.0-py3-none-any.whl