Skip to main content

Extract and process photoplethysmography and arterial blood pressure data from mimic3-waveforms and vitaldb.

Project description


MIMIC-III Database Tools

For extracting and cleaning ppg and abp data from the MIMIC-III Waveforms Database.

Table of Contents
  1. Introduction
  2. Getting Started
  3. Usage
  4. License

Introduction

This repo contains a set of tools for extracting and cleaning photoplethysmography (ppg) and artial blood pressure (abp) waveforms from the MIMIC-III Waveforms Database for the purpose of blood pressure estimation via deep learning.

(back to top)

Getting Started

This sections details the requirements to start using this library. Links are for Ubuntu installation.

Prerequisites

  1. Python
sudo apt install python3.8 -y
sudo apt install python3.8-dev python3.8-venv -y

echo 'export PATH="$PATH:/home/ubuntu/.local/bin"' >> ~/.bashrc
source ~/.bashrc

curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
python3.8 -m pip install virtualenv
python3.8 -m venv .venv/base-env
echo 'alias base-env="source ~/.venv/base-env/bin/activate"' >> ~/.bashrc
base-env

python3.8 -m pip install --upgrade pip
  1. Poetry
curl -sSL https://install.python-poetry.org | python3 -
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
source ~/.bashrc

# Verify installation
poetry --version

(back to top)

Usage

Poetry

The commands below can be used to install the poetry environment, build the project, and activate the environment.

cd database-tools
poetry lock
poetry install
poetry build
poetry shell

Create Data Directory

The functions in this library rely on a data folder named with the convention data-YYYY-MM-DD. This directory contains two additional folders, mimic3/ and figures/. The mimic3/lines/ folder is intended to hold the jsonlines files the data will initially saved to. The mimic3/records/ folder will hold the TFRecords files generated from these jsonlines files. This will be discussed in greater depth in the Generate Records section.

Get Valid Records

The class DataLocator (located in database_tools/tools/) is specifically written to find valid data files in the MIMIC-III Waveforms subset and create a csv of the html links for these data files. Performing this task prior to downloading is done to improve runtime and the usability of this workflow. Valid records refers to data files that contain both PPG and ABP recordings and are at least 10 minutes in length. Currently this code is only intended for the MIMIC-III Waveforms subset but will likely be adapated to allow for valid segments to be identified in the MIMIC-III Matched Subset (records are linked to clinical data). To perform an extraction the file scripts/get-valid-segs.py can be run (data directory and repository path must be configured manually). This function will output a csv called valid-segments.csv to the data directory provided. The figure below shows how these signals are located.

Add mimic3 valid segs logic figure.

Build Database

The class BuildDatabase (located in database_tools/tools/) downloads data from valid-segments.csv, extracts PPG and ABP data, and then processed it by leveraging the SignalProcessor class (located in database_tools/preprocessing/). A database can be build by running scripts/build_database.py (be sure to configure the paths). BuildDatabase takes a few important parameters which modify how signals are excluded and how the signals are divided prior to processing. The win_len parameter controls the length of each window, fs is the sampling rate of the data (125 Hz in the case of MIMIC-III), while samples_per_file, samples_per_patient, and max_samples control the size of the dataset (how many files the data is spread across, how many samples a patient can contribute, and the total number of samples in the dataset. The final parameter config controls the various constants of the SignalProcessor that determine the quality threshold for accepting signals. The SignalProcessor filters signals according to the figure chart below. The functions used for this filtering can be found in database_tools/preprocessing/. Data exctracted with this script is saved directly to the mimic3/lines/ folder in the data directory. A file named mimic3_stats.csv containing the stats of every processed waveform (not just the valid ones) will also be saved to the data directory.

Add data preprocessing figure.

Evaluate Dataset

The class DataEvaluator (located in database_tools/tools/) reads the mimic3_stats.csv file from the provided data directory and outputs figures to visualize the statistics. These figures are saved directly to the figures/ folder in the data directory in addition to be output such that they can be viewed in a Jupyter notebook. The 3D histogram are generated using the fuction histogram3d located in database_tools/plotting/.

Generate Records

Once data has been extracted TFRecords can be generated for training a Tensorflow model. The class RecordsHandler contains the method GenerateRecords which is used to create the TFRecords. This can be done using scripts/generate_records.py (paths must be configured). When calling GenerateRecords the size of the train, validation, and test splits, as well as the max number of samples per file and a boolean to control whether or not the data is standardized must be specified (using sklearn.preprocessing.StandardScaler().

Read Records

The class RecordsHandler also contains the function ReadRecords which can be used to read the TFRecords into a Tensorflow TFRecordsDataset object. This function can be used to inspect the integrity of the dataset or for loading the dataset for model training. The number of cores and a TensorFlow AUTOTUNE object must be provided.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heartfelt_tools-1.8.0.tar.gz (25.7 kB view hashes)

Uploaded Source

Built Distribution

heartfelt_tools-1.8.0-py3-none-any.whl (27.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page