GlucoseDao's fork of GlucoBench by IrinaStatsLab
Project description
GlucoBench
The official implementation of the paper "GlucoBench: Curated List of Continuous Glucose Monitoring Datasets with Prediction Benchmarks." If you found our work interesting and plan to re-use the code, please cite us as:
@article{
author = {Renat Sergazinov and Valeriya Rogovchenko and Elizabeth Chun and Nathaniel Fernandes and Irina Gaynanova},
title = {GlucoBench: Curated List of Continuous Glucose Monitoring Datasets with Prediction Benchmarks},
journal = {arXiv}
year = {2023},
}
Dependencies
We recommend to setup clean Python environment with conda by running conda create -n glucobench python=3.10. Then we can install all dependenices by running pip install -r requirments.txt.
To run Latent ODE model, install torchdiffeq.
Code organization
The code is organized as follows:
bin/: training commands for all modelsconfig/: configuration files for all datasetsdata_formatter/- base.py: performs all pre-processing for all CGM datasets
exploratory_analysis/: notebooks with processing steps for pulling the data and converting to.csvfileslib/gluformer/: model implementationlatent_ode/: model implementation*.py: hyper-paraemter tuning, training, validation, and testing scripts
output/: hyper-parameter optimization and testing logspaper_results/: code for producing tables and plots, found in the paperutils/: helper functions for model training and testingraw_data.zip: web-pulled CGM data (processed usingexploratory_analysis)environment.yml: conda environment file
Data
The datasets are distributed according to the following licences and can be downloaded from the following links outlined in the table below.
| Dataset | License | Number of patients | CGM Frequency |
|---|---|---|---|
| Colas | Creative Commons 4.0 | 208 | 5 minutes |
| Dubosson | Creative Commons 4.0 | 9 | 5 minutes |
| Hall | Creative Commons 4.0 | 57 | 5 minutes |
| Broll | GPL-2 | 5 | 5 minutes |
| Weinstock | Creative Commons 4.0 | 200 | 5 minutes |
To process the data, follow the instructions in the exploratory_analysis/ folder. Processed datasets should be saved in the raw_data/ folder. We provide examples in the raw_data.zip file.
How to reproduce results?
Setting up the enviroment
We recommend setting up a clean Python environment using Conda. Follow these steps:
-
Create a new environment named
glucobenchwith Python 3.10 by running:conda env create -n glucobench python=3.10 -
Activate the environment with:
conda activate glucobench -
Install all required dependencies by running:
pip install -r requirements.txt -
(Optional) To confirm that you're installing in the correct environment, run:
which pipThis should display the path to the
pipexecutable within theglucobenchenvironment."
Changing the configs
The config/ folder stores the best hyper-parameters (selected by Optuna) for each dataset and model. The config/ also stores the dataset-specific parameters for interpolation, dropping, splitting, and scaling. To train and evaluate the models with these defaults, we can simply run:
python ./lib/model.py --dataset dataset --use_covs False --optuna False
Changing the hyper-parameters
To change the search grid for hyper-parameters, we need to modify the ./lib/model.py file. Specifically, we look at the objective() function and modify the trial.suggest_* parameters to set the desired ranges. Once we are done, we can run the following command to re-run the hyper-parameter optimization:
python ./lib/model.py --dataset dataset --use_covs False --optuna True
How to work with the repository?
We provide a detailed example of the workflow in the example.ipynb notebook. For clarification, we provide some general suggestions below in order of increasing complexity.
Just the data
To start experimenting with the data, we can run the following command:
import yaml
from data_formatter.base import DataFormatter
with open(f'./config/{dataset}.yaml', 'r') as f:
config = yaml.safe_load(f)
formatter = DataFormatter(config)
The command exposes an object of class DataFormatter which automatically pre-processes the data upon initialization. The pre-processing steps can be controlled via the config/ files. The DataFormatter object exposes the following attributes:
formatter.train_data: training data (aspandas.DataFrame)formatter.val_data: validation dataformatter.test_data: testing (in-distribution and out-of-distribution) data i.formatter.test_data.loc[~formatter.test_data.index.isin(formatter.test_idx_ood)]: in-distribution testing data ii.formatter.test_data.loc[formatter.test_data.index.isin(formatter.test_idx_ood)]: out-of-distribution testing dataformatter.data: unscaled full data
Integration with PyTorch
Training models with PyTorch typically boils down to (1) defining a Dataset class with __getitem__() method, (2) wrapping it into a DataLoader, (3) defining a torch.nn.Module class with forward() method that implements the model, and (4) optimizing the model with torch.optim in a training loop.
Parts (1) and (2) crucically depend on the definition of the Dataset class. Essentially, having the data in the table format (e.g. formatter.train_data), how do we sample input-output pairs and pass the covariate information? The various Dataset classes conveniently adopted from the Darts library (see here) offer one way to wrap the data into a Dataset class. Different Dataset classes differ in what information is provided to the model:
SamplingDatasetPast: supports only past covariatesSamplingDatasetDual: supports only future-known covariatesSamplingDatasetMixed: supports both past and future-known covariates
Below we give an example of loading the data and wrapping it into a Dataset:
from utils.darts_processing import load_data
from utils.darts_dataset import SamplingDatasetDual
formatter, series, scalers = load_data(seed=0,
dataset=dataset,
use_covs=True,
cov_type='dual',
use_static_covs=True)
dataset_train = SamplingDatasetDual(series['train']['target'],
series['train']['future'],
output_chunk_length=out_len,
input_chunk_length=in_len,
use_static_covariates=True,
max_samples_per_ts=max_samples_per_ts,)
Parts (3) and (4) are model-specific, so we omit their discussion. For inspiration, we suggest to take a look at the lib/gluformer/model.py and lib/latent_ode/trainer_glunet.py files.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glucosedao_glucobench-0.3.1.tar.gz.
File metadata
- Download URL: glucosedao_glucobench-0.3.1.tar.gz
- Upload date:
- Size: 537.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2fe6530abe37ea5392ea30146138e01b2b6f5d1466eae0e14364969f908a199
|
|
| MD5 |
9041cc5b0537f9c4dcdf5d52bbe7dcd6
|
|
| BLAKE2b-256 |
e5e3fc4b2e8725d12d95fff4b837e85dd83fd67baf920182cc133222f5c5d27b
|
File details
Details for the file glucosedao_glucobench-0.3.1-py3-none-any.whl.
File metadata
- Download URL: glucosedao_glucobench-0.3.1-py3-none-any.whl
- Upload date:
- Size: 100.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
321d4346d38ef81db67918f849101219185b16d1d947d5ba232bb5be685ea6e7
|
|
| MD5 |
a3ba231854a6eb7d42b31641eb03fa36
|
|
| BLAKE2b-256 |
2abbb6548e04d8756bfce4b5917e44ad2fc57eda70c1a2ca5f1565b670b07707
|