Skip to main content

Reproducible ECG Benchmark data from Open access datasets

Project description

ECGBench

Reproducible ECG benchmark data from open-access datasets with PyTorch integration.

ECGBench provides a curated catalogue of 64 publicly available ECG datasets and a PyTorch Dataset class for loading ECG signals with standardised fold-based train/val/test splits.

Website: vlbthambawita.github.io/ECGBench

Installation

Using uv (Recommended)

uv pip install ecgbench

Using pip

pip install ecgbench

From source (development)

git clone https://github.com/vlbthambawita/ECGBench.git
cd ECGBench
uv pip install -e ".[dev]"

Dataset Catalogue

Query the curated index of 64 ECG datasets directly from Python:

import ecgbench

# List all 64 datasets
datasets = ecgbench.list_datasets()
print(f"{len(datasets)} datasets available")

# Search by name, origin, format, or paper
ecgbench.search("PTB-XL")

# Filter by category and access type
ecgbench.search(category="12-Lead (PhysioNet)", access="Open")

# Look up a single dataset by name
ecgbench.get_dataset("MIMIC-IV-ECG")

# List available categories
ecgbench.categories()
# ['12-Lead (PhysioNet)', '12-Lead (Other)', '1-Lead', '2-Lead', '3-Lead', 'BSPM/ECGI']

# Get as a pandas DataFrame
df = ecgbench.to_dataframe()

Loading ECG Signals (PyTorch)

Load ECG benchmark data as a PyTorch Dataset with reproducible fold splits:

from ecgbench import ECGDataset, ecg_collate_fn
from torch.utils.data import DataLoader

# Load PTB-XL training data (100 Hz, all folds)
dataset = ECGDataset(
    physionet_path="/path/to/physionet.org/files/ptb-xl/1.0.3/",
    dataset_name="ptbxl",
    split="train",
    frequency="100",
)

# Use custom collate to handle mixed metadata types (dicts, strings)
loader = DataLoader(dataset, batch_size=32, collate_fn=ecg_collate_fn)

for batch in loader:
    signals = batch["signal"]   # (B, channels, samples)
    ecg_ids = batch["ecg_id"]   # list of IDs
    break

ECGDataset Parameters

Parameter Type Default Description
physionet_path str | Path required Path to PhysioNet dataset root
dataset_name str "ptbxl" Dataset name
split str "train" "train", "val", or "test"
fold_numbers int | list | None None Fold(s) to load; None = all folds
frequency str "100" Sampling frequency: "100" or "500" Hz
ecgbench_root str | Path | None None Custom metadata CSV path (auto-detected by default)

Output Format

Each sample is a dict containing:

  • signal — float tensor with shape (channels, samples)
  • ecg_id — record identifier
  • patient_id, split, frequency — metadata fields
  • scp_codes — diagnostic codes (kept as dict)
  • All other CSV columns as tensors (numeric) or raw values

Development

# Install with dev dependencies
uv pip install -e ".[dev]"

# Lint
ruff check ecgbench/

# Format
black ecgbench/

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ecgbench-0.3.2.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ecgbench-0.3.2-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file ecgbench-0.3.2.tar.gz.

File metadata

  • Download URL: ecgbench-0.3.2.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ecgbench-0.3.2.tar.gz
Algorithm Hash digest
SHA256 572154d94eeab328f38457ef85e0194e17a6ce59b3ac57daa13ace27eac8bfc5
MD5 5498370ad7dcb36eda41bc6f7d3e0a7a
BLAKE2b-256 3cf041879b3b68015857b96a7a348c04dc0b1658262d5b3f19691a6331fa7163

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgbench-0.3.2.tar.gz:

Publisher: publish-pypi.yml on vlbthambawita/ECGBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ecgbench-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: ecgbench-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ecgbench-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d672a3d72a24813d9d406e5bad84ee89d67f03073cd0e3392ad430b6c8eec2d8
MD5 3605b0ced121358025c46f4c0f33e442
BLAKE2b-256 d805e070d2a0ab5d81348b908cd1d66b1d550db4f780327d255ac1da1913beff

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgbench-0.3.2-py3-none-any.whl:

Publisher: publish-pypi.yml on vlbthambawita/ECGBench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page