Skip to main content

A modular PSG preprocessing framework for multi-dataset standardization.

Project description

🇨🇳 中文版 README(SleepKit 中文文档)

SleepKit Logo

SleepKit PSG:多数据集睡眠 PSG 预处理框架

SleepKit PSG 是一个模块化、高度工程化的 Python 框架,专为多源多导睡眠图(Polysomnography, PSG)数据的标准化预处理而设计。 它能够将来自不同数据集(如 SHHS, MESA, CFS, Sleep-EDF 等)、不同格式(EDF, H5, MAT)的原始数据,统一转换为适合深度学习模型输入的标准格式(.npy 序列)。


🌟 核心功能

◎ 多数据集支持

内置 20+ 主流公开睡眠数据集的规则(SHHS, MESA, CFS, MASS, DOD, Sleep-EDF …)。

◎ 智能通道映射

自动识别并统一不同数据集中的通道名称(如将 EEG(sec) 自动识别为 C3)。

◎ 完整标准化预处理流程

包含:

  • 多格式文件读取(EDF / H5 / MAT)
  • 多标签格式解析(XML / TXT / CSV / EANNOT)
  • 预处理(重参考、带通滤波、陷波、重采样、Z-Score)
  • Epoch 切片 + 序列打包

◎ 提供 CLI + Python API

既能批处理,也能直接插入现有项目。


🛠️ 安装

环境要求

  • Python ≥ 3.8
  • numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml

方式一:直接安装

cd sleep_kit_project
pip install .

方式二:构建 Wheel 包

pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall

🚀 快速开始(Usage)

SleepKit 支持两种使用方式:CLI 与 Python API。


方式一:命令行工具(CLI)

sleepkit-process --dataset SHHS1 --data-root <原始数据路径> --out-root <输出目录>

参数说明:

参数 含义
--dataset 数据集名称(如 SHHS1)
--data-root EDF/XML 的根目录
--out-root 输出目录

示例:

sleepkit-process \
    --dataset SHHS1 \
    --data-root /public_data/nsrr/shhs/polysomnography \
    --out-root /data/processed/sleep_data

方式二:Python API

创建 run.py

import sleep_kit

raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'

sleep_kit.fast_preprocess(
    dataset_name='SHHS1',
    data_root=raw_dir,
    out_root=out_dir,
    channels=['C4', 'E1'],
    fs=100,
    seq_len=20,
    max_subjects=5
)

运行:

python run.py

📂 输出结构

/output/
└── SHHS1/
    ├── seq/
    │   ├── shhs1-200001-0.npy   # (Seq, C, T)
    │   ├── shhs1-200001-1.npy
    └── label/
        ├── shhs1-200001-0.npy   # (Seq,)

⚙️ 默认配置(config.py)

  • 采样率:100 Hz
  • Epoch:30 s
  • EEG 带通:0.3–35 Hz
  • EMG 带通:10–49 Hz
  • 工频陷波:50/60 Hz

支持数据集:

SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.


📝 常见问题(FAQ)

❓ 为什么输出 0 个被试?

因为 data_root 设置过深,应指向 EDF + 标签文件所在的上级目录

❓ 如何添加新数据集?

config.py

  1. 添加通道映射
  2. 添加 DATASET_RULE

❓ ImportError: No module named sleep_kit

请确保在项目根目录运行:

pip install .

如有问题,请联系作者:jinyang03702@163.com


English Version README(SleepKit Documentation)

SleepKit Logo

SleepKit PSG: A Multi-Dataset PSG Preprocessing Framework

SleepKit PSG is a modular and engineering-oriented Python framework designed for standardized preprocessing of multi-source, multi-channel polysomnography (PSG) data. It converts heterogeneous datasets (SHHS, MESA, CFS, Sleep-EDF, etc.) and formats (EDF, H5, MAT) into standardized .npy sequences for deep learning models.


🌟 Key Features

◎ Multi-Dataset Support

Built-in rules for 20+ major public PSG datasets.

◎ Intelligent Channel Mapping

Automatically unifies inconsistent channel names across datasets (e.g., EEG(sec)C3).

◎ Full Preprocessing Pipeline

Includes:

  • Multi-format reading (EDF / H5 / MAT)
  • Sleep-stage label parsing (XML / TXT / CSV / EANNOT)
  • Signal processing (re-reference, bandpass, notch, resample, Z-score)
  • Epoch slicing and sequence packaging

◎ CLI + Python API

Supports both batch processing and programmatic use.


🛠️ Installation

Requirements

  • Python ≥ 3.8
  • numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml

Method 1: Install directly

cd sleep_kit_project
pip install .

Method 2: Build a wheel

pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall

🚀 Quick Start

SleepKit supports CLI and Python API.


Method 1: CLI

sleepkit-process --dataset SHHS1 --data-root <raw_data> --out-root <output_dir>

Arguments:

Parameter Description
--dataset Dataset name
--data-root Root directory of EDF/XML
--out-root Output directory

Example:

sleepkit-process \
    --dataset SHHS1 \
    --data-root /public_data/nsrr/shhs/polysomnography \
    --out-root /data/processed/sleep_data

Method 2: Python API

Create run.py:

import sleep_kit

raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'

sleep_kit.fast_preprocess(
    dataset_name='SHHS1',
    data_root=raw_dir,
    out_root=out_dir,
    channels=['C4', 'E1'],
    fs=100,
    seq_len=20,
    max_subjects=5
)

Run:

python run.py

📂 Output Structure

/output/
└── SHHS1/
    ├── seq/
    │   ├── shhs1-200001-0.npy   # (Seq, C, T)
    │   ├── shhs1-200001-1.npy
    └── label/
        ├── shhs1-200001-0.npy   # (Seq,)

⚙️ Default Settings (config.py)

  • Sampling rate: 100 Hz
  • Epoch length: 30 s
  • EEG bandpass: 0.3–35 Hz
  • EMG bandpass: 10–49 Hz
  • Notch: 50/60 Hz

Supported datasets:

SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.


📝 FAQ

❓ Why does it process 0 subjects?

Because data_root is set too deep; it must point to the parent directory of EDF + annotation.

❓ How to add a new dataset?

Modify:

  1. CHANNEL_MAPPING
  2. DATASET_RULES

❓ ImportError: No module named sleep_kit

Run:

pip install .

For any issues or inquiries, please contact the author at: jinyang03702@163.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sleep_kit_psg-0.2.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sleep_kit_psg-0.2.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file sleep_kit_psg-0.2.0.tar.gz.

File metadata

  • Download URL: sleep_kit_psg-0.2.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sleep_kit_psg-0.2.0.tar.gz
Algorithm Hash digest
SHA256 967391377502f1b02177784d59b6215dfeac82201c41677995d12ced7e2bf62f
MD5 d74159959b00602fd5d98f641d5195c4
BLAKE2b-256 dc7847675651ee1e28bd478ff7009a0085d5eb2ed2f09798a3747aa81c892e6c

See more details on using hashes here.

File details

Details for the file sleep_kit_psg-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sleep_kit_psg-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sleep_kit_psg-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f85e95d454d3473b6203667411fff80819b5852840d52521758da965c0a9d3af
MD5 6a6abc6c3f2163b0267924e9ab18e6fa
BLAKE2b-256 1f1e6e89c253228eba8c4a5a38bc7f7410cf5a07c44617b07d49b379b627d312

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page