Skip to main content

Multi-format ECG file parsing and signal processing library

Project description

ECGDataKit

PyPI Version Tests Docs Python 3.10+ License GitLab

A Python library for parsing, processing, and visualizing multi-format ECG files.

Developed at UMMISCO / IRD by Ahmad Fall.

ecgdatakit.ummisco.fr — Full documentation, API reference, and getting started guide.


Features

Parsing — 12 ECG formats, one unified data model

Format File Types Detection
HL7 aECG .xml <AnnotatedECG in header
Philips Sierra XML .xml <restingecgdata in header
ISHNE Holter .ecg, .hol ISHNE1.0 or ANN 1.0 magic bytes
Mortara EL250 .xml <ECG + <CHANNEL in header
EDF/EDF+ .edf "0 " at offset 0
SCP-ECG .scp Valid Section 0 pointer table at offset 6
GE MUSE XML .xml <RestingECG> in header
DICOM Waveform .dcm DICM at offset 128
WFDB (PhysioNet) .hea + .dat .hea extension + valid header
MFER .mwf, .mfer Valid MFER tag + BER length
Mindray BeneHeart R12 .xml <BeneHeartR12> or <MindrayECG>
GE MAC 2000 .xml <MAC2000> or <GE_MAC>

Signal Processing

Category Capabilities
Filtering Butterworth (lowpass, highpass, bandpass, notch), baseline removal, diagnostic & monitoring presets
Peak Detection Pan-Tompkins, Shannon energy
Heart Rate Average HR, RR intervals, instantaneous beat-by-beat HR
HRV Analysis Time-domain (SDNN, RMSSD, pNN50), frequency-domain (VLF/LF/HF), Poincaré (SD1/SD2)
Spectral FFT, Welch PSD, beat segmentation, ensemble averaging
Quality Signal quality index (SQI), SNR estimation
Leads Derive III, aVR/aVL/aVF, full 12-lead assembly
Cleaning Built-in, BioSPPy, NeuroKit2, combined pipelines
Deep Denoising DeepFADE — a DenseNet encoder-decoder denoising autoencoder trained on a large private ECG database (weights bundled)

Visualization

Type Plots
ECG Waveforms Single lead, multi-lead, standard 12-lead grid with paper background
Annotations R-peak markers, RR intervals, heart rate overlay
Beat Analysis Segmented beats, ensemble-averaged beat with SD shading
Spectral Power spectrum (PSD/FFT), spectrogram
HRV Tachogram, Poincaré plot, frequency bands, metrics dashboard
Reports Signal quality per lead, full ECG report with patient info
Interactive All plots available as interactive Plotly versions (zoom, pan, hover)

Installation

# Core (parsing only)
pip install ecgdatakit

# With signal processing
pip install "ecgdatakit[processing]"

# With static plots (matplotlib)
pip install "ecgdatakit[plotting]"

# With interactive plots (plotly)
pip install "ecgdatakit[plotting-interactive]"

# With ECG cleaning backends
pip install "ecgdatakit[cleaning]"

# With DeepFADE denoising autoencoder (requires torch)
pip install "ecgdatakit[denoising]"

# Everything (except torch — install separately if needed)
pip install "ecgdatakit[all]"

Optional extras for specific formats:

pip install "ecgdatakit[holter]"   # ISHNE Holter CRC validation
pip install "ecgdatakit[dicom]"    # DICOM waveform support

Quick Start

Parse an ECG file

from ecgdatakit import FileParser

record = FileParser().parse("path/to/ecg_file.xml")

print(record.source_format)            # "sierra_xml"
print(record.patient.first_name)       # "John"
print(record.patient.age)              # 55
print(record.recording.sample_rate)    # 500
print(record.measurements.heart_rate)  # 75
print(record.device.manufacturer)      # "Philips"
print(record.signal.data_encoding)     # "base64"
print(len(record.leads))               # 12

json_str = record.to_json()

Process signals

from ecgdatakit.processing import (
    diagnostic_filter, detect_r_peaks, heart_rate,
    rr_intervals, time_domain, signal_quality_index, clean_ecg,
)

lead = record.leads[1]

filtered = diagnostic_filter(lead)

peaks = detect_r_peaks(filtered)
peaks_se = detect_r_peaks(filtered, method="shannon_energy")

hr = heart_rate(filtered, peaks)
rr = rr_intervals(filtered, peaks)

hrv = time_domain(rr)
print(hrv["sdnn"], hrv["rmssd"], hrv["pnn50"])

sqi = signal_quality_index(lead)

cleaned = clean_ecg(lead)
cleaned = clean_ecg(lead, method="neurokit2")
cleaned = clean_ecg(lead, method="deepfade")

Visualize

from ecgdatakit.plotting import (
    plot_lead, plot_12lead, plot_peaks, plot_hrv_summary,
    iplot_lead, iplot_12lead,
)

# Static plots auto-display by default
plot_12lead(record)
plot_peaks(filtered, peaks)
plot_hrv_summary(rr)

# To get the figure without displaying (e.g. for saving):
fig = plot_12lead(record, show=False)
fig.savefig("ecg_12lead.png", dpi=150)

# Use sample indices instead of time on the x-axis:
plot_lead(filtered, x_axis="samples")

# Interactive plots (plotly) — opens in browser
iplot_lead(filtered, peaks).show()
iplot_12lead(record).show()

Batch processing

from pathlib import Path
from ecgdatakit import parse_batch

files = list(Path("ecg_data/").glob("*.xml"))
for record in parse_batch(files, max_workers=4):
    print(record.patient.patient_id, record.measurements.heart_rate)

Data Model

All parsers produce the same ECGRecord:

ECGRecord
  patient: PatientInfo        # ID, name, birth date, sex, age, weight, height, medications
  recording: RecordingInfo    # date, duration, sample rate, ADC gain, technician, physician
  device: DeviceInfo          # manufacturer, model, name, serial number, software version
  filters: FilterSettings     # highpass, lowpass, notch frequencies
  signal: SignalCharacteristics  # bits/sample, encoding, compression, channel counts
  leads: list[Lead]           # label, samples (float64 array), sample rate, units
  interpretation: Interpretation  # statements, severity, source, interpreter
  measurements: GlobalMeasurements  # HR, PR, QRS, QT, QTc, axes, RR interval
  median_beats: list[Lead]    # median/template beats if available
  annotations: dict[str, str] # additional key-value annotations
  source_format: str          # parser identifier
  raw_metadata: dict          # original format-specific metadata

Exceptions

All exceptions inherit from ECGDataKitError:

Exception When raised
UnsupportedFormatError File format not recognized
CorruptedFileError File is truncated or structurally invalid
MissingElementError Required element or field is missing
ChecksumError Checksum validation failed

Testing

pip install -e ".[all,dev,holter,dicom]"
pytest tests/ -v

Author

Ahmad FallUMMISCO / IRD

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ecgdatakit-0.0.8.tar.gz (48.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ecgdatakit-0.0.8-py3-none-any.whl (48.7 MB view details)

Uploaded Python 3

File details

Details for the file ecgdatakit-0.0.8.tar.gz.

File metadata

  • Download URL: ecgdatakit-0.0.8.tar.gz
  • Upload date:
  • Size: 48.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ecgdatakit-0.0.8.tar.gz
Algorithm Hash digest
SHA256 adccb53ac6b8e33720600b65bb091df135810f3296d36f500289b79b2bafbaf4
MD5 45baef4719a5ca3720e77a559e7ce409
BLAKE2b-256 ab7dd3bdd0518db41d6b9c850085f6b6c73c05af45f869f62722f10d0f112d15

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgdatakit-0.0.8.tar.gz:

Publisher: publish.yml on UMMISCO/ECGDataKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ecgdatakit-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: ecgdatakit-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 48.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ecgdatakit-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 074ede83eec738ede72ee6d20af2874d69f6db1e4d8552f4421e10b88e210d07
MD5 e98efe552d5f81b82344e7fcbb5850ef
BLAKE2b-256 36821e69ce9315d0a7fb0f1e063a5db71abfd1e2efdde45e4ecac7022ac216d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgdatakit-0.0.8-py3-none-any.whl:

Publisher: publish.yml on UMMISCO/ECGDataKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page