Skip to main content

Extract and process photoplethysmography and arterial blood pressure data from mimic3-waveforms and vitaldb.

Project description


MIMIC-III Database Tools

For extracting and cleaning ppg and abp data from the MIMIC-III Waveforms Database.

Table of Contents
  1. Introduction
  2. Getting Started
  3. Usage
  4. License

Introduction

This repo contains a set of tools for extracting and cleaning photoplethysmography (ppg) and artial blood pressure (abp) waveforms from the MIMIC-III Waveforms Database for the purpose of blood pressure estimation via deep learning.

(back to top)

Getting Started

This sections details the requirements to start using this library. Links are for Ubuntu installation.

Prerequisites

  1. Python
sudo apt install python3.8 -y
sudo apt install python3.8-dev python3.8-venv -y

echo 'export PATH="$PATH:/home/ubuntu/.local/bin"' >> ~/.bashrc
source ~/.bashrc

curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
python3.8 -m pip install virtualenv
python3.8 -m venv .venv/base-env
echo 'alias base-env="source ~/.venv/base-env/bin/activate"' >> ~/.bashrc
base-env

python3.8 -m pip install --upgrade pip
  1. Poetry
curl -sSL https://install.python-poetry.org | python3 -
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
source ~/.bashrc

# Verify installation
poetry --version

(back to top)

Usage

Poetry

The commands below can be used to install the poetry environment, build the project, and activate the environment.

cd database-tools
poetry lock
poetry install
poetry build
poetry shell

Create Data Directory

The functions in this library rely on a data folder named with the convention data-YYYY-MM-DD. This directory contains two additional folders, mimic3/ and figures/. The mimic3/lines/ folder is intended to hold the jsonlines files the data will initially saved to. The mimic3/records/ folder will hold the TFRecords files generated from these jsonlines files. This will be discussed in greater depth in the Generate Records section.

Get Valid Records

The class DataLocator (located in database_tools/tools/) is specifically written to find valid data files in the MIMIC-III Waveforms subset and create a csv of the html links for these data files. Performing this task prior to downloading is done to improve runtime and the usability of this workflow. Valid records refers to data files that contain both PPG and ABP recordings and are at least 10 minutes in length. Currently this code is only intended for the MIMIC-III Waveforms subset but will likely be adapated to allow for valid segments to be identified in the MIMIC-III Matched Subset (records are linked to clinical data). To perform an extraction the file scripts/get-valid-segs.py can be run (data directory and repository path must be configured manually). This function will output a csv called valid-segments.csv to the data directory provided. The figure below shows how these signals are located.

Add mimic3 valid segs logic figure.

Build Database

The class BuildDatabase (located in database_tools/tools/) downloads data from valid-segments.csv, extracts PPG and ABP data, and then processed it by leveraging the SignalProcessor class (located in database_tools/preprocessing/). A database can be build by running scripts/build_database.py (be sure to configure the paths). BuildDatabase takes a few important parameters which modify how signals are excluded and how the signals are divided prior to processing. The win_len parameter controls the length of each window, fs is the sampling rate of the data (125 Hz in the case of MIMIC-III), while samples_per_file, samples_per_patient, and max_samples control the size of the dataset (how many files the data is spread across, how many samples a patient can contribute, and the total number of samples in the dataset. The final parameter config controls the various constants of the SignalProcessor that determine the quality threshold for accepting signals. The SignalProcessor filters signals according to the figure chart below. The functions used for this filtering can be found in database_tools/preprocessing/. Data exctracted with this script is saved directly to the mimic3/lines/ folder in the data directory. A file named mimic3_stats.csv containing the stats of every processed waveform (not just the valid ones) will also be saved to the data directory.

Add data preprocessing figure.

Evaluate Dataset

The class DataEvaluator (located in database_tools/tools/) reads the mimic3_stats.csv file from the provided data directory and outputs figures to visualize the statistics. These figures are saved directly to the figures/ folder in the data directory in addition to be output such that they can be viewed in a Jupyter notebook. The 3D histogram are generated using the fuction histogram3d located in database_tools/plotting/.

Generate Records

Once data has been extracted TFRecords can be generated for training a Tensorflow model. The class RecordsHandler contains the method GenerateRecords which is used to create the TFRecords. This can be done using scripts/generate_records.py (paths must be configured). When calling GenerateRecords the size of the train, validation, and test splits, as well as the max number of samples per file and a boolean to control whether or not the data is standardized must be specified (using sklearn.preprocessing.StandardScaler().

Read Records

The class RecordsHandler also contains the function ReadRecords which can be used to read the TFRecords into a Tensorflow TFRecordsDataset object. This function can be used to inspect the integrity of the dataset or for loading the dataset for model training. The number of cores and a TensorFlow AUTOTUNE object must be provided.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heartfelt_tools-0.11.0.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

heartfelt_tools-0.11.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file heartfelt_tools-0.11.0.tar.gz.

File metadata

  • Download URL: heartfelt_tools-0.11.0.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-60-generic

File hashes

Hashes for heartfelt_tools-0.11.0.tar.gz
Algorithm Hash digest
SHA256 5e70c7b87e75897f81cd774536bc7c96200fe08c2d340666e3269214816059a4
MD5 6c30a4bf32673e27aacbd480730eaa5d
BLAKE2b-256 832a0b978c3ef0d2d501f502004adc6c3dece81c62a2d1f8bcbf68a27c69fac9

See more details on using hashes here.

File details

Details for the file heartfelt_tools-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: heartfelt_tools-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-60-generic

File hashes

Hashes for heartfelt_tools-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5bed78ff6db584715490caad48f7661b5a885f4b35e1d052018e7eaa9909036f
MD5 7c29f077c3996f2540b22d12fde716d8
BLAKE2b-256 a48f0fc013b23e5114c943cdfaba5432d0d46150c5af8fc884b68c98945ece3d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page