Skip to main content

A repo of clinical benchmarks for MIMIC-IV

Project description


A comprehensive set of clinical benchmarks


  • (Recommended) Create the benchmark environment: conda env create -f environment.yml
  • Pip install: pip install clinical_benchmarks


Regenerate data files

If you'd like to regenerate the data files from the source datasets, you'll need to have a Google Cloud Platform (GCP) account, BigQuery dataset, and GCS storage bucket prepared. These are used to (1) create intermediate tables, (2) copy those tables out of BigQuery to GCS, and finally (3) download the tables locally.

As described by the Python Client for Google BigQuery, setup requires you to:

Once you have GCP project, have created a BigQuery dataset, and have created a GCP storage bucket, you can use environment variables to specify them when running the download script:

clinical_benchmarks download

If you are not a Linux/MacOS user Alternatively, you can specify the values at the command line:

clinical_benchmarks --csv_dir MY-SAVE-DIR download --project MY-GCP-PROJECT --dataset MY-BIGQUERY-DATASET --bucket MY-STORAGE-BUCKET

If you prefer to mannually regenerate the data files Check the data_pipeline.ipynb which prepared runnable cells to download data files

Create task dependent datasets

Available Tasks

- Vancoymydin Dosing Prediction (Reinforcement Learning)
- Heparin Dosing Prediction (Reinforcement Learning)

All tasks are designed as data processing class, inherited from the BaseDataProcessor class. Each task class has two methods, create_task_df and save_task_df. (detail see

Procedure of creating task datasets

(check data_pipeline.ipynb for runnable example)

  1. Modify the environment_config.env file with your own environment variables
  2. Import necessary packages and use dotenv to load environment_config.env
    import dotenv
    env_file = 'path_to_environment.env'
    dotenv.load_dotenv(env_file, override=True)
  3. Choose a task, such as Vancomycin dosing prediction.
  4. Create the task object.
    vanco = VancomycinDosingDataProcessor()
  5. Create task dataframe by calling method create_task_df. This method require two arguments, time_step and agg. time_step determines the time interval (hourly based) between each state, and agg determines the aggregation method that will be used during dataframe merging process, such as "last".
    time_step = 4
    agg = 'last'
    vanco.create_task_df(time_step, agg)
    # featured_cohort_with_time is the outcome task_df
  6. Save the task dataframe created in step 5 to a given directory by calling method save_task_df. This method require two arguments, csv_dir and filename. csv_dir needs to be a Path object, which specify the directory you would like to save the task_df, and filename is a str that represent the output file name (the filename must has a .csv.gz extension).
    csv_dir = Path('YOUR_SAVE_DIR')
    filename = 'vancomycin_dosing_task.csv.gz'
    # will save vanco.featured_cohort_with_time as a .csv.gz file
    vanco.save_task_df(save_dir, filename)
  7. Check your saving directory, and try it with your model!

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clinical_benchmarks-1.0.0.tar.gz (24.2 kB view hashes)

Uploaded source

Built Distribution

clinical_benchmarks-1.0.0-py3-none-any.whl (32.7 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page