A modular PSG preprocessing framework for multi-dataset standardization.
Project description
🇨🇳 中文版 README(SleepKit 中文文档)
SleepKit PSG:多数据集睡眠 PSG 预处理框架
SleepKit PSG 是一个模块化、高度工程化的 Python 框架,专为多源多导睡眠图(Polysomnography, PSG)数据的标准化预处理而设计。
它能够将来自不同数据集(如 SHHS, MESA, CFS, Sleep-EDF 等)、不同格式(EDF, H5, MAT)的原始数据,统一转换为适合深度学习模型输入的标准格式(.npy 序列)。
🌟 核心功能
◎ 多数据集支持
内置 20+ 主流公开睡眠数据集的规则(SHHS, MESA, CFS, MASS, DOD, Sleep-EDF …)。
◎ 智能通道映射
自动识别并统一不同数据集中的通道名称(如将 EEG(sec) 自动识别为 C3)。
◎ 完整标准化预处理流程
包含:
- 多格式文件读取(EDF / H5 / MAT)
- 多标签格式解析(XML / TXT / CSV / EANNOT)
- 预处理(重参考、带通滤波、陷波、重采样、Z-Score)
- Epoch 切片 + 序列打包
◎ 提供 CLI + Python API
既能批处理,也能直接插入现有项目。
🛠️ 安装
环境要求
- Python ≥ 3.8
- numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml
方式一:直接安装
cd sleep_kit_project
pip install .
方式二:构建 Wheel 包
pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall
🚀 快速开始(Usage)
SleepKit 支持两种使用方式:CLI 与 Python API。
方式一:命令行工具(CLI)
sleepkit-process --dataset SHHS1 --data-root <原始数据路径> --out-root <输出目录>
参数说明:
| 参数 | 含义 |
|---|---|
--dataset |
数据集名称(如 SHHS1) |
--data-root |
EDF/XML 的根目录 |
--out-root |
输出目录 |
示例:
sleepkit-process \
--dataset SHHS1 \
--data-root /public_data/nsrr/shhs/polysomnography \
--out-root /data/processed/sleep_data
方式二:Python API
创建 run.py:
import sleep_kit
raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'
sleep_kit.fast_preprocess(
dataset_name='SHHS1',
data_root=raw_dir,
out_root=out_dir,
channels=['C4', 'E1'],
fs=100,
seq_len=20,
max_subjects=5
)
运行:
python run.py
📂 输出结构
/output/
└── SHHS1/
├── seq/
│ ├── shhs1-200001-0.npy # (Seq, C, T)
│ ├── shhs1-200001-1.npy
└── label/
├── shhs1-200001-0.npy # (Seq,)
⚙️ 默认配置(config.py)
- 采样率:100 Hz
- Epoch:30 s
- EEG 带通:0.3–35 Hz
- EMG 带通:10–49 Hz
- 工频陷波:50/60 Hz
支持数据集:
SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.
📝 常见问题(FAQ)
❓ 为什么输出 0 个被试?
因为 data_root 设置过深,应指向 EDF + 标签文件所在的上级目录。
❓ 如何添加新数据集?
在 config.py:
- 添加通道映射
- 添加 DATASET_RULE
❓ ImportError: No module named sleep_kit
请确保在项目根目录运行:
pip install .
如有问题,请联系作者:jinyang03702@163.com
English Version README(SleepKit Documentation)
SleepKit PSG: A Multi-Dataset PSG Preprocessing Framework
SleepKit PSG is a modular and engineering-oriented Python framework designed for standardized preprocessing of multi-source, multi-channel polysomnography (PSG) data.
It converts heterogeneous datasets (SHHS, MESA, CFS, Sleep-EDF, etc.) and formats (EDF, H5, MAT) into standardized .npy sequences for deep learning models.
🌟 Key Features
◎ Multi-Dataset Support
Built-in rules for 20+ major public PSG datasets.
◎ Intelligent Channel Mapping
Automatically unifies inconsistent channel names across datasets (e.g., EEG(sec) → C3).
◎ Full Preprocessing Pipeline
Includes:
- Multi-format reading (EDF / H5 / MAT)
- Sleep-stage label parsing (XML / TXT / CSV / EANNOT)
- Signal processing (re-reference, bandpass, notch, resample, Z-score)
- Epoch slicing and sequence packaging
◎ CLI + Python API
Supports both batch processing and programmatic use.
🛠️ Installation
Requirements
- Python ≥ 3.8
- numpy, mne, h5py, scipy, sklearn, matplotlib, tqdm, pyyaml
Method 1: Install directly
cd sleep_kit_project
pip install .
Method 2: Build a wheel
pip install build
python -m build
pip install dist/sleep_kit_psg-0.1.0-py3-none-any.whl --force-reinstall
🚀 Quick Start
SleepKit supports CLI and Python API.
Method 1: CLI
sleepkit-process --dataset SHHS1 --data-root <raw_data> --out-root <output_dir>
Arguments:
| Parameter | Description |
|---|---|
--dataset |
Dataset name |
--data-root |
Root directory of EDF/XML |
--out-root |
Output directory |
Example:
sleepkit-process \
--dataset SHHS1 \
--data-root /public_data/nsrr/shhs/polysomnography \
--out-root /data/processed/sleep_data
Method 2: Python API
Create run.py:
import sleep_kit
raw_dir = r'/public_data/nsrr/shhs/polysomnography'
out_dir = r'/data/output_test'
sleep_kit.fast_preprocess(
dataset_name='SHHS1',
data_root=raw_dir,
out_root=out_dir,
channels=['C4', 'E1'],
fs=100,
seq_len=20,
max_subjects=5
)
Run:
python run.py
📂 Output Structure
/output/
└── SHHS1/
├── seq/
│ ├── shhs1-200001-0.npy # (Seq, C, T)
│ ├── shhs1-200001-1.npy
└── label/
├── shhs1-200001-0.npy # (Seq,)
⚙️ Default Settings (config.py)
- Sampling rate: 100 Hz
- Epoch length: 30 s
- EEG bandpass: 0.3–35 Hz
- EMG bandpass: 10–49 Hz
- Notch: 50/60 Hz
Supported datasets:
SHHS1, SHHS2, MESA, CFS, CCSHS, MROS1, MROS2, ABC, HMC, MASS13, DOD, etc.
📝 FAQ
❓ Why does it process 0 subjects?
Because data_root is set too deep; it must point to the parent directory of EDF + annotation.
❓ How to add a new dataset?
Modify:
CHANNEL_MAPPINGDATASET_RULES
❓ ImportError: No module named sleep_kit
Run:
pip install .
For any issues or inquiries, please contact the author at: jinyang03702@163.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sleep_kit_psg-1.2.1.tar.gz.
File metadata
- Download URL: sleep_kit_psg-1.2.1.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a624d3da3f78847942ab58bd06122d7fd92b56e1bd0321b317928c4ce829e39
|
|
| MD5 |
e6b499c6a58fa14864a0ed9730751b1f
|
|
| BLAKE2b-256 |
b872891fcd0dd402b0e07912b83eef8b67401babb1cc0e32268149eeaa2f2783
|
File details
Details for the file sleep_kit_psg-1.2.1-py3-none-any.whl.
File metadata
- Download URL: sleep_kit_psg-1.2.1-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6a5651525e9243181873380c5a2e18b586727508eba8ec5b1f783c094f14424
|
|
| MD5 |
0ba8a8a2f6b206e23665d6148c7cb093
|
|
| BLAKE2b-256 |
d32169c29865ef0642cc14cf3fb00486d0c930b6011dcd7c301620236b59332b
|