Skip to main content

A modular PSG preprocessing framework for multi-dataset standardization.

Project description

🇨🇳 中文版 README(SleepKit 中文文档)

SleepKit Logo

SleepKit PSG:多数据集睡眠 PSG 预处理框架

SleepKit PSG 是一个模块化、高度工程化的 Python 框架,专为多源多导睡眠图(Polysomnography, PSG)数据的标准化预处理而设计。 它能够将来自不同数据集(如 SHHS, MESA, CFS, Sleep-EDF 等)、不同格式(EDF, H5, MAT)的原始数据,统一转换为适合深度学习模型输入的标准格式(.npy 序列)。


🌟 核心功能

◎ 多数据集支持

内置 20+ 主流公开睡眠数据集的规则(SHHS, MESA, CFS, MASS, DOD, Sleep-EDF …)。

◎ 智能通道映射

自动识别并统一不同数据集中的通道名称(如将 EEG(sec) 自动识别为 C3)。

◎ 完整标准化预处理流程

包含:

  • 多格式文件读取(EDF / H5 / MAT)
  • 多标签格式解析(XML / TXT / CSV / EANNOT)
  • 预处理(重参考、带通滤波、陷波、重采样、Z-Score)
  • Epoch 切片 + 序列打包

◎ 提供 CLI + Python API

既能批处理,也能直接插入现有项目。


🛠️ 安装

环境要求

  • Python ≥ 3.8
  • numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml

方式一:直接安装

cd sleep_kit_project
pip install .

方式二:构建 Wheel 包

pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall

🚀 快速开始(Usage)

SleepKit 支持两种使用方式:CLI 与 Python API。


方式一:命令行工具(CLI)

sleepkit-process --dataset SHHS1 --data-root <原始数据路径> --out-root <输出目录>

参数说明:

参数 含义
--dataset 数据集名称(如 SHHS1)
--data-root EDF/XML 的根目录
--out-root 输出目录

示例:

sleepkit-process \
    --dataset SHHS1 \
    --data-root /public_data/nsrr/shhs/polysomnography \
    --out-root /data/processed/sleep_data

方式二:Python API

创建 run.py

import sleep_kit

raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'

sleep_kit.fast_preprocess(
    dataset_name='SHHS1',
    data_root=raw_dir,
    out_root=out_dir,
    channels=['C4', 'E1'],
    fs=100,
    seq_len=20,
    max_subjects=5
)

运行:

python run.py

📂 输出结构

/output/
└── SHHS1/
    ├── seq/
    │   ├── shhs1-200001-0.npy   # (Seq, C, T)
    │   ├── shhs1-200001-1.npy
    └── label/
        ├── shhs1-200001-0.npy   # (Seq,)

⚙️ 默认配置(config.py)

  • 采样率:100 Hz
  • Epoch:30 s
  • EEG 带通:0.3–35 Hz
  • EMG 带通:10–49 Hz
  • 工频陷波:50/60 Hz

支持数据集:

SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.


📝 常见问题(FAQ)

❓ 为什么输出 0 个被试?

因为 data_root 设置过深,应指向 EDF + 标签文件所在的上级目录

❓ 如何添加新数据集?

config.py

  1. 添加通道映射
  2. 添加 DATASET_RULE

❓ ImportError: No module named sleep_kit

请确保在项目根目录运行:

pip install .

如有问题,请联系作者:jinyang03702@163.com


English Version README(SleepKit Documentation)

SleepKit Logo

SleepKit PSG: A Multi-Dataset PSG Preprocessing Framework

SleepKit PSG is a modular and engineering-oriented Python framework designed for standardized preprocessing of multi-source, multi-channel polysomnography (PSG) data. It converts heterogeneous datasets (SHHS, MESA, CFS, Sleep-EDF, etc.) and formats (EDF, H5, MAT) into standardized .npy sequences for deep learning models.


🌟 Key Features

◎ Multi-Dataset Support

Built-in rules for 20+ major public PSG datasets.

◎ Intelligent Channel Mapping

Automatically unifies inconsistent channel names across datasets (e.g., EEG(sec)C3).

◎ Full Preprocessing Pipeline

Includes:

  • Multi-format reading (EDF / H5 / MAT)
  • Sleep-stage label parsing (XML / TXT / CSV / EANNOT)
  • Signal processing (re-reference, bandpass, notch, resample, Z-score)
  • Epoch slicing and sequence packaging

◎ CLI + Python API

Supports both batch processing and programmatic use.


🛠️ Installation

Requirements

  • Python ≥ 3.8
  • numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml

Method 1: Install directly

cd sleep_kit_project
pip install .

Method 2: Build a wheel

pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall

🚀 Quick Start

SleepKit supports CLI and Python API.


Method 1: CLI

sleepkit-process --dataset SHHS1 --data-root <raw_data> --out-root <output_dir>

Arguments:

Parameter Description
--dataset Dataset name
--data-root Root directory of EDF/XML
--out-root Output directory

Example:

sleepkit-process \
    --dataset SHHS1 \
    --data-root /public_data/nsrr/shhs/polysomnography \
    --out-root /data/processed/sleep_data

Method 2: Python API

Create run.py:

import sleep_kit

raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'

sleep_kit.fast_preprocess(
    dataset_name='SHHS1',
    data_root=raw_dir,
    out_root=out_dir,
    channels=['C4', 'E1'],
    fs=100,
    seq_len=20,
    max_subjects=5
)

Run:

python run.py

📂 Output Structure

/output/
└── SHHS1/
    ├── seq/
    │   ├── shhs1-200001-0.npy   # (Seq, C, T)
    │   ├── shhs1-200001-1.npy
    └── label/
        ├── shhs1-200001-0.npy   # (Seq,)

⚙️ Default Settings (config.py)

  • Sampling rate: 100 Hz
  • Epoch length: 30 s
  • EEG bandpass: 0.3–35 Hz
  • EMG bandpass: 10–49 Hz
  • Notch: 50/60 Hz

Supported datasets:

SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.


📝 FAQ

❓ Why does it process 0 subjects?

Because data_root is set too deep; it must point to the parent directory of EDF + annotation.

❓ How to add a new dataset?

Modify:

  1. CHANNEL_MAPPING
  2. DATASET_RULES

❓ ImportError: No module named sleep_kit

Run:

pip install .

For any issues or inquiries, please contact the author at: jinyang03702@163.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sleep_kit_psg-1.2.1.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sleep_kit_psg-1.2.1-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file sleep_kit_psg-1.2.1.tar.gz.

File metadata

  • Download URL: sleep_kit_psg-1.2.1.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sleep_kit_psg-1.2.1.tar.gz
Algorithm Hash digest
SHA256 5a624d3da3f78847942ab58bd06122d7fd92b56e1bd0321b317928c4ce829e39
MD5 e6b499c6a58fa14864a0ed9730751b1f
BLAKE2b-256 b872891fcd0dd402b0e07912b83eef8b67401babb1cc0e32268149eeaa2f2783

See more details on using hashes here.

File details

Details for the file sleep_kit_psg-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: sleep_kit_psg-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sleep_kit_psg-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e6a5651525e9243181873380c5a2e18b586727508eba8ec5b1f783c094f14424
MD5 0ba8a8a2f6b206e23665d6148c7cb093
BLAKE2b-256 d32169c29865ef0642cc14cf3fb00486d0c930b6011dcd7c301620236b59332b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page