Reproducible ECG Benchmark data from Open access datasets
Project description
ECGBench
Reproducible ECG benchmark data from open-access datasets with PyTorch integration.
ECGBench provides a curated catalogue of 64 publicly available ECG datasets and a PyTorch Dataset class for loading ECG signals with standardised fold-based train/val/test splits.
Website: vlbthambawita.github.io/ECGBench
Installation
Using uv (Recommended)
uv pip install ecgbench
Using pip
pip install ecgbench
From source (development)
git clone https://github.com/vlbthambawita/ECGBench.git
cd ECGBench
uv pip install -e ".[dev]"
Dataset Catalogue
Query the curated index of 64 ECG datasets directly from Python:
import ecgbench
# List all 64 datasets
datasets = ecgbench.list_datasets()
print(f"{len(datasets)} datasets available")
# Search by name, origin, format, or paper
ecgbench.search("PTB-XL")
# Filter by category and access type
ecgbench.search(category="12-Lead (PhysioNet)", access="Open")
# Look up a single dataset by name
ecgbench.get_dataset("MIMIC-IV-ECG")
# List available categories
ecgbench.categories()
# ['12-Lead (PhysioNet)', '12-Lead (Other)', '1-Lead', '2-Lead', '3-Lead', 'BSPM/ECGI']
# Get as a pandas DataFrame
df = ecgbench.to_dataframe()
Loading ECG Signals (PyTorch)
Load ECG benchmark data as a PyTorch Dataset with reproducible fold splits:
from ecgbench import ECGDataset, ecg_collate_fn
from torch.utils.data import DataLoader
# Load PTB-XL training data (100 Hz, all folds)
dataset = ECGDataset(
physionet_path="/path/to/physionet.org/files/ptb-xl/1.0.3/",
dataset_name="ptbxl",
split="train",
frequency="100",
)
# Use custom collate to handle mixed metadata types (dicts, strings)
loader = DataLoader(dataset, batch_size=32, collate_fn=ecg_collate_fn)
for batch in loader:
signals = batch["signal"] # (B, channels, samples)
ecg_ids = batch["ecg_id"] # list of IDs
break
ECGDataset Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
physionet_path |
str | Path |
required | Path to PhysioNet dataset root |
dataset_name |
str |
"ptbxl" |
Dataset name |
split |
str |
"train" |
"train", "val", or "test" |
fold_numbers |
int | list | None |
None |
Fold(s) to load; None = all folds |
frequency |
str |
"100" |
Sampling frequency: "100" or "500" Hz |
ecgbench_root |
str | Path | None |
None |
Custom metadata CSV path (auto-detected by default) |
Output Format
Each sample is a dict containing:
signal— float tensor with shape(channels, samples)ecg_id— record identifierpatient_id,split,frequency— metadata fieldsscp_codes— diagnostic codes (kept as dict)- All other CSV columns as tensors (numeric) or raw values
Development
# Install with dev dependencies
uv pip install -e ".[dev]"
# Lint
ruff check ecgbench/
# Format
black ecgbench/
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ecgbench-0.3.2.tar.gz.
File metadata
- Download URL: ecgbench-0.3.2.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
572154d94eeab328f38457ef85e0194e17a6ce59b3ac57daa13ace27eac8bfc5
|
|
| MD5 |
5498370ad7dcb36eda41bc6f7d3e0a7a
|
|
| BLAKE2b-256 |
3cf041879b3b68015857b96a7a348c04dc0b1658262d5b3f19691a6331fa7163
|
Provenance
The following attestation bundles were made for ecgbench-0.3.2.tar.gz:
Publisher:
publish-pypi.yml on vlbthambawita/ECGBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecgbench-0.3.2.tar.gz -
Subject digest:
572154d94eeab328f38457ef85e0194e17a6ce59b3ac57daa13ace27eac8bfc5 - Sigstore transparency entry: 1294674913
- Sigstore integration time:
-
Permalink:
vlbthambawita/ECGBench@f155f3a7326682443054f3281a1d9914f4a7f4b6 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f155f3a7326682443054f3281a1d9914f4a7f4b6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ecgbench-0.3.2-py3-none-any.whl.
File metadata
- Download URL: ecgbench-0.3.2-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d672a3d72a24813d9d406e5bad84ee89d67f03073cd0e3392ad430b6c8eec2d8
|
|
| MD5 |
3605b0ced121358025c46f4c0f33e442
|
|
| BLAKE2b-256 |
d805e070d2a0ab5d81348b908cd1d66b1d550db4f780327d255ac1da1913beff
|
Provenance
The following attestation bundles were made for ecgbench-0.3.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on vlbthambawita/ECGBench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecgbench-0.3.2-py3-none-any.whl -
Subject digest:
d672a3d72a24813d9d406e5bad84ee89d67f03073cd0e3392ad430b6c8eec2d8 - Sigstore transparency entry: 1294675009
- Sigstore integration time:
-
Permalink:
vlbthambawita/ECGBench@f155f3a7326682443054f3281a1d9914f4a7f4b6 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/vlbthambawita
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@f155f3a7326682443054f3281a1d9914f4a7f4b6 -
Trigger Event:
push
-
Statement type: