Skip to main content

A framwork for creating deep learning pipelines for sleep data

Project description

SleePyPhases

License: MIT Python 3.8+ PyPI

SleePyPhases is an open-source Python workflow framework that provides unified, FAIR-compliant access to multiple sleep data repositories through a configuration-driven harmonization approach.

Overview

Sleep research relies on polysomnography (PSG) data from diverse public repositories and vendor systems, yet the lack of standardized access methods and semantic harmonization creates substantial barriers to data reuse and reproducibility. SleePyPhases addresses these challenges by:

  • Standardizing channel naming across different datasets and vendors
  • Harmonizing annotation semantics for sleep stages, arousals, respiratory events, and leg movements
  • Unifying data formats to enable seamless multi-dataset studies
  • Providing configuration-driven preprocessing with efficient storage mechanisms
  • Ensuring reproducibility through configuration-based provenance tracking

Features

  • ๐Ÿ”Œ Unified Data Access: Load data from 10+ public repositories and 2 commercial vendors through a single interface
  • โš™๏ธ Configuration-Driven: Define preprocessing, data manipulation, and training pipelines through YAML configuration
  • ๐Ÿ”„ Automatic Synchronization: Generated artifacts stay synchronized with configuration changes
  • ๐Ÿงฉ Modular Architecture: Extend functionality through plugins for datasets, preprocessing, and ML frameworks
  • ๐Ÿ“Š ML Pipeline Integration: Built-in support for PyTorch and TensorFlow training workflows
  • ๐Ÿ“ˆ Comprehensive Evaluation: Segment-wise and event-wise evaluation with clinical metrics

Supported Datasets & Formats

Public Repositories

  • Sleep Heart Health Study (SHHS)
  • Multi-Ethnic Study of Atherosclerosis (MESA)
  • MrOS Sleep Study
  • PhysioNet 2018 Challenge
  • SleepEDF Database Expanded
  • Cleveland Family Study (CFS) - WIP
  • PhysioNet 2023 Challenge - WIP
  • Human Sleep Project (HSP) - WIP
  • Dreem Open Dataset - WIP
  • CAP Sleep Database - WIP
  • ISRUC-Sleep - WIP

Vendor Formats

  • Philips Aliceยฎ
  • Somnomedics Dominoยฎ
  • Nox Medicalยฎ - WIP
  • Profusion Sleep Softwareยฎ - WIP
  • Sonataยฎ - WIP

Requirements

SleePyPhases requires Python 3.8+ or Docker.

Quick Start

1. Clone Example Project

  • Clone the example project: git clone https://gitlab.com/sleep-is-all-you-need/pyphases/spp-boilderplate.git SPP-MyProject
  • Move to project: cd SPP-MyProject

The example project can be customized using:

  • src/SignalPreprocessing.py signal preprocessing to be stored on the filesystem
  • src/DataManipulation.py data manipulation before passing to the ml model
  • src/models/SimpleCRNN/SimpleCRNN.py very basic cnn/lstm pytorch model
  • config/config.yml workflow configuration

The example project also provides:

  • basic pyPhases structure
  • Dockerfile defining the docker image
  • docker-compose.yml to build and run the docker container

The example project can be extended by:

  • Init-Phase to inject data manipulation and preprocessing custom code to the project
  • project.yaml basic project configuration to add additional phases

Setup (docker compose)

  • update the volumes in docker-compose.yml
  • remove nvidia GPU deploy section in docker-compose.yml if no nvida GPU is available
  • data, logs and eval folder will be created and require write-access
  • phase can be executed using: docker compose run phases run Training

Setup (Python)

  • install requirement: pip install -r requirements.txt
  • data, logs and eval folder will be created and require write-access
  • phase can be executed using
    • phases run Training (if installed in environment)
    • python -m phases run Training (if python is installed)

Change Configuration

Changes can be made using the configs/config.yml or creating a new config file and loading it with the -c paramater: phases run -c myconfig1.yml,myconfig2.yml Training.

useLoader: shhs
shhs-path: /path/to/shhs/dataset

preprocessing:
  targetFrequency: 64
  labelFrequency: 64
  stepsPerType:
    eeg: [filter, resample, standardize]
  targetChannels:
    - [EEG]

labelChannels:
  - SleepStagesAASM

dataversion:
  version: my-experiment
  seed: 2025
  folds: 5
  split:
    test: "0:100"
    trainval: "100:500"

The shhs dataset needs to be downloaded to the shhs-path location.

Train and Evaluate a Model

  • run the training phases run Training

  • evaluate a trained model: phases run EvalReport

Architecture

SleePyPhases is built on three main components:

  1. pyPhases: Core framework for configuration-driven project management
  2. SleepHarmonizer: PSG data harmonization plugin with standardized interfaces
  3. pyPhasesML: Machine learning operations including preprocessing and training
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    SleePyPhases                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   pyPhases      โ”‚ SleepHarmonizer โ”‚    pyPhasesML       โ”‚
โ”‚   (Core)        โ”‚  (Data Access)  โ”‚  (ML Pipeline)      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ - Configuration โ”‚ - Record Loader โ”‚ - Preprocessing     โ”‚
โ”‚ - Phases        โ”‚ - Channel Map   โ”‚ - Data Manipulation โ”‚
โ”‚ - Data Storage  โ”‚ - Annotations   โ”‚ - Model Training    โ”‚
โ”‚ - Plugins       โ”‚ - Metadata      โ”‚ - Evaluation        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuration Examples

Data Filtering

dataversion:
  version: shhs1-ahi15
  filterQuery: recordId.str.startswith("shhs1-") and ahi > 15
  seed: 2025
  folds: 4
  split:
    test: "0:1000"
    trainval: "1000:2056"

Training/Modle

modelName: MyModel # needs to be stored in src/models/MyModel.py
trainingParameter:
  learningRate: 0.00025
  learningRateDecay: 0.001
  batchSize: 32
  optimizer: adams
  shuffle: True
  shuffleSeed: 2025

  # test to run longer
  stopAfterNotImproving: 25
  maxEpochs: 1000

  validationMetrics: # for each label channel
    - [f1, kappa]
    - [f1, kappa]

Evaluation

labelChannels:
  - SleepStagesAASM
  - SleepArousals
  - SleepApnea
  - SleepLegMovements

eval:
  batchSize: 1
  metrics:
    - [f1, kappa]      # sleep stages
    - [auprc, f1]      # arousal
    - [f1, kappa]      # respiratory events
    - [auprc, f1]      # leg movements
  clinicalMetrics:
    - tst              # Total Sleep Time
    - waso             # Wake After Sleep Onset
    - ahi              # Apnea-Hypopnea Index
    - arousalIndex
    - indexPLMS

Custom PSG Loader Configuration

This show a custom file structure where a recording is stored using the Alice 6ยฎ PSG software in following structure:

/recordings/{recordId} with three subfiles:{recording}.edf, {recording}.txt and {recording}.rml.

loader:
  my-alice:
    dataBase: DSDS
    dataIsFinal: False # more recordings will be stored in the future
    dataset:
      loaderName: RecordLoaderAlice
      dataHandler:
        type: folders
        listFilter: acq
        canReadRemote: True
        basePath: .
        extensions: [.edf, .rml, .txt]
        force: False
        idPattern: .*/(.*]).edf
        signal-path: "{recordId}/{recordId}.edf"
        annotation-path: "{recordId}/{recordId}.rml"
        metadata-path: "{recordId}/{recordId}.txt"

    # the channels that should be extracted from the edf files
    sourceChannels:
      - name: EEG F3-A2
        type: eeg
      - name: EEG F4-A1
        type: eeg
      - name: EEG C3-A2
    # ...

useLoader: my-alice
alice-path: /recordings

Validation & Reproducibility

SleePyPhases has been validated through reproduction of five published sleep analysis studies:

Study Model Datasets Original SPP Difference
Pourbabaee et al. DRCNN PhysioNet 0.528 0.548 +3.6%
Phan et al. Transformer SHHS 0.828 0.828 0.0%
Kotzen et al. CNN MESA 0.74 0.733 -1.0%
Zahid et al. CNN MrOS 0.704 0.679 -3.7%
Lee et al. Transformer SleepEDF 0.682 0.662 -3.0%

Included Projects

FAIR Principles

SleePyPhases adheres to FAIR principles:

  • Findable: Public repositories on GitLab and Python Package Index
  • Accessible: Open-source MIT licensing
  • Interoperable: Supports multiple signal formats and vendor formats
  • Reusable: Modular plugin architecture with versioned releases

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This research was funded by the Federal Ministry of Research, Technology and Space under the funding code 01ZZ2324F.

Computing resources were provided by the NHR Center of TU Dresden, jointly supported by the Federal Ministry of Education and Research and participating state governments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sleepyphases-0.7.0.tar.gz (93.5 kB view details)

Uploaded Source

File details

Details for the file sleepyphases-0.7.0.tar.gz.

File metadata

  • Download URL: sleepyphases-0.7.0.tar.gz
  • Upload date:
  • Size: 93.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sleepyphases-0.7.0.tar.gz
Algorithm Hash digest
SHA256 0c33bb3c5b8c971b9706e8fe7af3fa67b1cd41da52da4bbcea016a3f3f1e75e1
MD5 5948a8e09d2a3bf7c843d021fe8faf72
BLAKE2b-256 f53928d90c22231ab7f6ee9b0d02d0d042cdedaa7aa97687c90ae677fa439245

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page