Skip to main content

Wi-Fi Sensing Data Processing - A Python library for downloading, processing, analyzing and training on Wi-Fi CSI data

Project description

SDP: Sensing Data Protocol for Scalable Wireless Sensing

SDP Website PyPI License Python PyTorch Tests Docs Colab


📖 Citation

If you use SDP in your research, please cite:

@misc{zhang2026sdpunifiedprotocolbenchmarking,
      title={SDP: A Unified Protocol and Benchmarking Framework for Reproducible Wireless Sensing}, 
      author={Di Zhang and Jiawei Huang and Yuanhao Cui and Xiaowen Cao and Tony Xiao Han and Xiaojun Jing and Christos Masouros},
      year={2026},
      eprint={2601.08463},
      archivePrefix={arXiv},
      primaryClass={eess.SP},
      url={https://arxiv.org/abs/2601.08463}, 
}


🇬🇧 English

🎯 What is SDP?

SDP is a protocol-level abstraction and unified benchmark for reproducible wireless sensing.

⚠️ SDP is not a new neural network, but a standardized protocol that unifies CSI representations for fair comparison.

The Problem

Wireless sensing research often suffers from:

  • ❌ Hardware-specific CSI formats
  • ❌ Inconsistent preprocessing pipelines
  • ❌ Unstable training results
  • ❌ Large performance variance across random seeds

Result: Models cannot be fairly compared.

The Solution

SDP solves this at the protocol level, not the model level:

Feature Raw CSI Other Tools SDP
Standardized Format ❌ Hardware-specific ⚠️ Partial Unified CSIFrame
Multi-Dataset Support ❌ Manual parsing ⚠️ 2-3 datasets 5 datasets built-in
Preprocessing ❌ DIY ⚠️ Basic only Wavelet + Phase Calib
Reproducibility ❌ Random ⚠️ Varies 5-seed standard
Deep Learning ❌ From scratch ⚠️ Limited CNN+Transformer
CLI Interface ❌ None ⚠️ Partial Full CLI support

SDP projects raw CSI into a fixed canonical frequency grid (K=30), ensuring cross-hardware comparability.

Performance Highlights

Metric Result
Accuracy SOTA on 5 datasets
Reproducibility 5-seed evaluation standard
Stability Low variance across runs

Accuracy Figure 1: Accuracy comparison across datasets

Reproducibility Figure 2: Reproducibility and stability analysis

Ablation Figure 3: Ablation study results


🚀 Quick Start (3 Steps, 5 Minutes)

Step 1: Install (30 seconds)

pip install wsdp

Verify installation:

wsdp --version

Step 2: Download Dataset (2 minutes)

Option A: From CLI (Recommended for testing)

# elderAL = smallest dataset, fastest for testing
wsdp download elderAL ./data

# Or download larger datasets:
# wsdp download widar ./data
# wsdp download gait ./data
# wsdp download xrf55 ./data
# wsdp download zte ./data

Option B: From SDP Website

Download manually if you encounter network issues.

Required Dataset Structure:

data/
├── elderAL/                    # Dataset name
│   ├── action0_static_new/     # Activity folder
│   │   ├── user0_position1_activity0/  # Sample folder
│   │   │   ├── sample1.csv
│   │   │   └── ...
│   │   └── ...
│   ├── action1_walk_new/
│   └── ...
├── widar/
├── gait/
├── xrf55/
└── zte/

Step 3: Train & Evaluate (2 minutes)

🐍 Python API (Recommended for research):

Create train.py:

from wsdp import pipeline

# Minimal call - uses default hyperparameters
pipeline("./data/elderAL", "./output", "elderAL")

# Or with custom hyperparameters
pipeline(
    input_path="./data/elderAL",
    output_folder="./output",
    dataset="elderAL",
    learning_rate=1e-3,
    num_epochs=50,
    batch_size=64,
)

Run:

python train.py

💻 CLI (Quick & Simple):

# Basic training
wsdp run ./data/elderAL ./output elderAL

# With hyperparameter override
wsdp run ./data/elderAL ./output elderAL --lr 0.001 --epochs 50 --batch-size 64

# With config file
wsdp run ./data/elderAL ./output elderAL --config my_config.yaml

📊 What You Get:

After training, check ./output/:

output/
├── best_model.pth              # Best model checkpoint
├── confusion_matrix.png        # Evaluation visualization
├── training_curves.png         # Loss & accuracy curves
└── output.log                  # Detailed training logs

If you see these files, SDP is working correctly!


📊 Supported Datasets

Dataset Format Subcarriers Complex Scenarios Size
Widar .dat (bfee) 30 Gesture recognition ~2GB
Gait .dat (bfee) 30 Gait recognition ~1GB
XRF55 .npy 30 Human activity ~3GB
ElderAL .csv varies Elderly activity ~500MB
ZTE .csv 512 CSI with I/Q ~4GB

More datasets coming soon! See Roadmap.


🔬 Research & Customization

🧠 Plug in Your Own Model

Step 1: Create custom_model.py:

import torch
import torch.nn as nn

class YourCustomModel(nn.Module):
    def __init__(self, num_classes=6):
        super().__init__()
        # Your architecture here
        # Input shape: (Batch, Timestamp, Frequency, Antenna)
        
    def forward(self, x):
        # Your forward pass
        return output

# Required: expose model class
model = YourCustomModel

Step 2: Run with your model:

wsdp run ./data/elderAL ./output elderAL -m custom_model.py

📁 Use Your Own Dataset

Organize your data:

data/
└── my_dataset/
    ├── user0_pos0_action0/
    │   ├── sample1.csv
    │   └── ...
    └── user0_pos0_action1/
        └── ...

Run:

wsdp run ./data/my_dataset ./output my_dataset

🗺️ Codebase Map

Want to go deeper? Here's where to modify:

Directory Purpose What to Modify
models/ Architectures Define or compare model architectures
algorithms/ Signal Processing Denoising, calibration, etc.
datasets/ Dataset Wrappers Add new dataset loaders
readers/ File Readers Add new format parsers
structure/ Data Structures Modify CSIFrame format
processors/ Protocol Logic Adjust canonical projection

🧪 Understanding SDP (10-Min Deep Dive)

The SDP Pipeline

Raw CSI
  ↓
[Deterministic Sanitization]
  - Phase calibration
  - Wavelet denoising
  ↓
[Canonical Tensor Construction]
  - K=30 frequency grid
  - Standardized shape
  ↓
[Deep Learning Model]
  ↓
Prediction

Canonical Tensor Format

After sanitization, SDP constructs a Canonical CSI Tensor:

$$X \in \mathbb{C}^{A \times K \times T}$$

Where:

  • $A$ = Number of antennas
  • $K$ = 30 (fixed frequency grid)
  • $T$ = Time samples

This ensures cross-hardware comparability.

Why Deterministic?

Raw CSI contains hardware distortions:

  • Phase offsets
  • Sampling time offsets
  • Noise fluctuations

SDP enforces deterministic calibration and denoising, guaranteeing:

  • ✅ Same raw CSI → Same cleaned tensor
  • ✅ Reproducibility is enforced, not optional

📚 Documentation & Resources


🗺️ Roadmap

  • v0.1 - Initial protocol design
  • v0.2 - 5 datasets support, CLI tool
  • v0.3 - More datasets (WiFi-HAR, CSI-HAR, etc.)
  • v0.4 - Online demo platform
  • v0.5 - PyPI official release
  • v1.0 - Full protocol standardization

Want a specific dataset? Open an issue and let us know!


🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • Development setup
  • Coding guidelines
  • Pull request process

📄 License

MIT License - see LICENSE file.


🇨🇳 中文

🎯 SDP 是什么?

SDP 是一个协议级抽象框架,用于可复现的无线感知研究

⚠️ SDP 不是一个新的神经网络,而是一个标准化协议,统一 CSI 表示以实现公平比较。

问题所在

无线感知研究常面临:

  • ❌ 硬件特定的 CSI 格式
  • ❌ 不一致的预处理流程
  • ❌ 不稳定的训练结果
  • ❌ 随机种子间性能方差大

结果:模型无法公平比较。

解决方案

SDP 在协议层面解决问题,而非模型层面:

特性 原始 CSI 其他工具 SDP
标准化格式 ❌ 硬件特定 ⚠️ 部分支持 统一 CSIFrame
多数据集支持 ❌ 手动解析 ⚠️ 2-3 个 5 个内置数据集
预处理 ❌ 自行实现 ⚠️ 仅基础 小波+相位校准
可复现性 ❌ 随机 ⚠️ 不稳定 5 种子标准
深度学习 ❌ 从零开始 ⚠️ 有限 CNN+Transformer
CLI 接口 ❌ 无 ⚠️ 部分 完整 CLI 支持

SDP 将原始 CSI 投影到固定的规范频率网格 (K=30),确保跨硬件可比性。

性能亮点

指标 结果
准确率 5 个数据集上达到 SOTA
可复现性 5 种子评估标准
稳定性 多次运行方差低

准确率 图 1:跨数据集准确率对比

可复现性 图 2:可复现性与稳定性分析

消融实验 图 3:消融实验结果


🚀 快速开始(3 步,5 分钟)

第 1 步:安装(30 秒)

pip install wsdp

验证安装:

wsdp --version

第 2 步:下载数据集(2 分钟)

方式 A:命令行下载(测试推荐)

# elderAL = 最小数据集,测试最快
wsdp download elderAL ./data

# 或下载更大的数据集:
# wsdp download widar ./data
# wsdp download gait ./data
# wsdp download xrf55 ./data
# wsdp download zte ./data

方式 B:从 SDP 官网 下载

如遇到网络问题,可手动下载。

必需的数据集结构:

data/
├── elderAL/                    # 数据集名称
│   ├── action0_static_new/     # 活动文件夹
│   │   ├── user0_position1_activity0/  # 样本文件夹
│   │   │   ├── sample1.csv
│   │   │   └── ...
│   │   └── ...
│   ├── action1_walk_new/
│   └── ...
├── widar/
├── gait/
├── xrf55/
└── zte/

第 3 步:训练与评估(2 分钟)

🐍 Python API(研究推荐):

创建 train.py

from wsdp import pipeline

# 最小调用 - 使用默认超参数
pipeline("./data/elderAL", "./output", "elderAL")

# 或自定义超参数
pipeline(
    input_path="./data/elderAL",
    output_folder="./output",
    dataset="elderAL",
    learning_rate=1e-3,
    num_epochs=50,
    batch_size=64,
)

运行:

python train.py

💻 命令行(快速简单):

# 基础训练
wsdp run ./data/elderAL ./output elderAL

# 自定义超参数
wsdp run ./data/elderAL ./output elderAL --lr 0.001 --epochs 50 --batch-size 64

# 使用配置文件
wsdp run ./data/elderAL ./output elderAL --config my_config.yaml

📊 输出文件:

训练后,查看 ./output/

output/
├── best_model.pth              # 最佳模型检查点
├── confusion_matrix.png        # 评估可视化
├── training_curves.png         # 损失和准确率曲线
└── output.log                  # 详细训练日志

如果看到这些文件,说明 SDP 运行正常!


📊 支持的数据集

数据集 格式 子载波 复数 场景 大小
Widar .dat (bfee) 30 手势识别 ~2GB
Gait .dat (bfee) 30 步态识别 ~1GB
XRF55 .npy 30 人体活动 ~3GB
ElderAL .csv varies 老年人活动 ~500MB
ZTE .csv 512 I/Q 格式 CSI ~4GB

更多数据集即将推出! 查看 路线图


🔬 研究与定制

🧠 接入你自己的模型

第 1 步: 创建 custom_model.py

import torch
import torch.nn as nn

class YourCustomModel(nn.Module):
    def __init__(self, num_classes=6):
        super().__init__()
        # 你的架构代码
        # 输入形状: (Batch, Timestamp, Frequency, Antenna)
        
    def forward(self, x):
        # 你的前向传播
        return output

# 必需:暴露模型类
model = YourCustomModel

第 2 步: 使用你的模型运行:

wsdp run ./data/elderAL ./output elderAL -m custom_model.py

📁 使用你自己的数据集

组织你的数据:

data/
└── my_dataset/
    ├── user0_pos0_action0/
    │   ├── sample1.csv
    │   └── ...
    └── user0_pos0_action1/
        └── ...

运行:

wsdp run ./data/my_dataset ./output my_dataset

🗺️ 代码结构地图

想深入修改?这里是各目录功能:

目录 用途 修改内容
models/ 架构 定义或比较模型架构
algorithms/ 信号处理 去噪、校准等
datasets/ 数据集包装 添加新数据集加载器
readers/ 文件读取器 添加新格式解析器
structure/ 数据结构 修改 CSIFrame 格式
processors/ 协议逻辑 调整规范投影

🧪 理解 SDP(10 分钟深度阅读)

SDP 流程

原始 CSI
  ↓
[确定性清洗]
  - 相位校准
  - 小波去噪
  ↓
[规范张量构建]
  - K=30 频率网格
  - 标准化形状
  ↓
[深度学习模型]
  ↓
预测

规范张量格式

清洗后,SDP 构建规范 CSI 张量

$$X \in \mathbb{C}^{A \times K \times T}$$

其中:

  • $A$ = 天线数量
  • $K$ = 30(固定频率网格)
  • $T$ = 时间样本

这确保了跨硬件可比性

为什么是确定性的?

原始 CSI 包含硬件失真:

  • 相位偏移
  • 采样时间偏移
  • 噪声波动

SDP 强制执行确定性校准和去噪,保证:

  • ✅ 相同的原始 CSI → 相同的清洗后张量
  • ✅ 可复现性是强制的,不是可选的

📚 文档与资源


🗺️ 路线图

  • v0.1 - 初始协议设计
  • v0.2 - 5 个数据集支持,CLI 工具
  • v0.3 - 更多数据集(WiFi-HAR、CSI-HAR 等)
  • v0.4 - 在线演示平台
  • v0.5 - PyPI 正式发布
  • v1.0 - 完整协议标准化

想要特定数据集? 提交 issue 告诉我们!


🤝 贡献

欢迎贡献!查看 CONTRIBUTING.md 了解:

  • 开发环境搭建
  • 编码规范
  • Pull Request 流程

📄 许可证

MIT 许可证 - 详见 LICENSE 文件。


Made with ❤️ by the WSDP Team

⬆ Back to Top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsdp-0.2.0.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsdp-0.2.0-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file wsdp-0.2.0.tar.gz.

File metadata

  • Download URL: wsdp-0.2.0.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for wsdp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 11be747e546a63b5eeec19c198c4e955fb16c778199b130888cff69d9d85c6a1
MD5 c7d597b72898a2c5bfa4d41a8f08c546
BLAKE2b-256 6e1c1b2f0cebbfcac2951968a2019ca8d01a03ab920cd14556bbdbbbc66ca6bb

See more details on using hashes here.

File details

Details for the file wsdp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: wsdp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 38.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for wsdp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e19c118feef131f0988bf4907b9434be0dbfa94cf2892753ae8ccb3068e72406
MD5 878d63d07215a566643703038f69719d
BLAKE2b-256 8bd52504fc9fa1f56c0944499ab4b1beec860d5c4beaa71d9610c4b849a34130

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page