Skip to main content

Multi-format ECG file parsing and signal processing library

Project description

ECGDataKit

PyPI Version Tests Docs Python 3.10+ License GitLab

A Python library for parsing, processing, and visualizing multi-format ECG files.

Developed at UMMISCO / IRD by Ahmad Fall.

ecgdatakit.ummisco.fr — Full documentation, API reference, and getting started guide.


Features

Parsing — 12 ECG formats, one unified data model

Format File Types Detection
HL7 aECG .xml <AnnotatedECG in header
Philips Sierra XML .xml <restingecgdata in header
ISHNE Holter .ecg, .hol ISHNE1.0 or ANN 1.0 magic bytes
Mortara EL250 .xml <ECG + <CHANNEL in header
EDF/EDF+ .edf "0 " at offset 0
SCP-ECG .scp Valid Section 0 pointer table at offset 6
GE MUSE XML .xml <RestingECG> in header
DICOM Waveform .dcm DICM at offset 128
WFDB (PhysioNet) .hea + .dat .hea extension + valid header
MFER .mwf, .mfer Valid MFER tag + BER length
Mindray BeneHeart R12 .xml <BeneHeartR12> or <MindrayECG>
GE MAC 2000 .xml <MAC2000> or <GE_MAC>

Signal Processing

Category Capabilities
Filtering Butterworth (lowpass, highpass, bandpass, notch), baseline removal, diagnostic & monitoring presets
Peak Detection Pan-Tompkins, Shannon energy
Heart Rate Average HR, RR intervals, instantaneous beat-by-beat HR
HRV Analysis Time-domain (SDNN, RMSSD, pNN50), frequency-domain (VLF/LF/HF), Poincaré (SD1/SD2)
Spectral FFT, Welch PSD, beat segmentation, ensemble averaging
Quality Signal quality index (SQI), SNR estimation
Leads Derive III, aVR/aVL/aVF, full 12-lead assembly
Cleaning Built-in, BioSPPy, NeuroKit2, combined pipelines
Deep Denoising DeepFADE — a DenseNet encoder-decoder denoising autoencoder trained on a large private ECG database (weights bundled)

Visualization

Type Plots
ECG Waveforms Single lead, multi-lead, standard 12-lead grid with paper background
Annotations R-peak markers, RR intervals, heart rate overlay
Beat Analysis Segmented beats, ensemble-averaged beat with SD shading
Spectral Power spectrum (PSD/FFT), spectrogram
HRV Tachogram, Poincaré plot, frequency bands, metrics dashboard
Reports Signal quality per lead, full ECG report with patient info
Interactive All plots available as interactive Plotly versions (zoom, pan, hover)

Installation

# Core (parsing only)
pip install ecgdatakit

# With signal processing
pip install "ecgdatakit[processing]"

# With static plots (matplotlib)
pip install "ecgdatakit[plotting]"

# With interactive plots (plotly)
pip install "ecgdatakit[plotting-interactive]"

# With ECG cleaning backends
pip install "ecgdatakit[cleaning]"

# With DeepFADE denoising autoencoder (requires torch)
pip install "ecgdatakit[denoising]"

# Everything (except torch — install separately if needed)
pip install "ecgdatakit[all]"

Optional extras for specific formats:

pip install "ecgdatakit[holter]"   # ISHNE Holter CRC validation
pip install "ecgdatakit[dicom]"    # DICOM waveform support

Quick Start

Parse an ECG file

from ecgdatakit import FileParser

record = FileParser().parse("path/to/ecg_file.xml")

print(record.source_format)            # "sierra_xml"
print(record.patient.first_name)       # "John"
print(record.patient.age)              # 55
print(record.recording.acquisition.signal.sampling_rate)  # 500
print(record.measurements.heart_rate)  # 75
print(record.device.manufacturer)      # "Philips"
print(record.signal.data_encoding)     # "base64"
print(len(record.leads))               # 12

json_str = record.to_json()

Process signals

from ecgdatakit.processing import (
    diagnostic_filter, detect_r_peaks, heart_rate,
    rr_intervals, time_domain, signal_quality_index, clean_ecg,
)

lead = record.leads[1]

filtered = diagnostic_filter(lead)

peaks = detect_r_peaks(filtered)
peaks_se = detect_r_peaks(filtered, method="shannon_energy")

hr = heart_rate(filtered, peaks)
rr = rr_intervals(filtered, peaks)

hrv = time_domain(rr)
print(hrv["sdnn"], hrv["rmssd"], hrv["pnn50"])

sqi = signal_quality_index(lead)

cleaned = clean_ecg(lead)
cleaned = clean_ecg(lead, method="neurokit2")
cleaned = clean_ecg(lead, method="deepfade")

Visualize

from ecgdatakit.plotting import (
    plot_lead, plot_12lead, plot_peaks, plot_hrv_summary,
    iplot_lead, iplot_12lead,
)

# Static plots auto-display by default
plot_12lead(record)
plot_peaks(filtered, peaks)
plot_hrv_summary(rr)

# To get the figure without displaying (e.g. for saving):
fig = plot_12lead(record, show=False)
fig.savefig("ecg_12lead.png", dpi=150)

# Use sample indices instead of time on the x-axis:
plot_lead(filtered, x_axis="samples")

# Interactive plots (plotly) — opens in browser
iplot_lead(filtered, peaks).show()
iplot_12lead(record).show()

Batch processing

from pathlib import Path
from ecgdatakit import parse_batch

files = list(Path("ecg_data/").glob("*.xml"))
for record in parse_batch(files, max_workers=4):
    print(record.patient.patient_id, record.measurements.heart_rate)

Data Model

All parsers produce the same ECGRecord:

ECGRecord
  patient: PatientInfo        # ID, name, birth date, sex, age, weight, height, medications
  recording: RecordingInfo    # date, duration, sample rate, ADC gain, technician, physician
  device: DeviceInfo          # manufacturer, model, name, serial number, software version
  filters: FilterSettings     # highpass, lowpass, notch frequencies
  signal: SignalCharacteristics  # bits/sample, encoding, compression, channel counts
  leads: list[Lead]           # label, samples (float64 array), sample rate, units
  interpretation: Interpretation  # statements, severity, source, interpreter
  measurements: GlobalMeasurements  # HR, PR, QRS, QT, QTc, axes, RR interval
  median_beats: list[Lead]    # median/template beats if available
  annotations: dict[str, str] # additional key-value annotations
  source_format: str          # parser identifier
  raw_metadata: dict          # original format-specific metadata

Exceptions

All exceptions inherit from ECGDataKitError:

Exception When raised
UnsupportedFormatError File format not recognized
CorruptedFileError File is truncated or structurally invalid
MissingElementError Required element or field is missing
ChecksumError Checksum validation failed

Testing

pip install -e ".[all,dev,holter,dicom]"
pytest tests/ -v

Author

Ahmad FallUMMISCO / IRD

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ecgdatakit-1.0.0.tar.gz (48.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ecgdatakit-1.0.0-py3-none-any.whl (48.7 MB view details)

Uploaded Python 3

File details

Details for the file ecgdatakit-1.0.0.tar.gz.

File metadata

  • Download URL: ecgdatakit-1.0.0.tar.gz
  • Upload date:
  • Size: 48.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ecgdatakit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 516302e683241227f2726bd770733af22fb6829aabfa53c500b24f69b50b9f4b
MD5 57879e6e235f36834880c6bf7cd96ac0
BLAKE2b-256 7af7f2e036cd07b454a71afcac864173a542f678051d2bbe83e6a49e59b2b15c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgdatakit-1.0.0.tar.gz:

Publisher: publish.yml on UMMISCO/ECGDataKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ecgdatakit-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ecgdatakit-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 48.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ecgdatakit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62f35ff3afba405a9c33fe5a65584d0566ce88c34b5cb4b1c92a1e33c6b2bfbc
MD5 c83f9b649d0cb1146744b73e03b1252b
BLAKE2b-256 aefaaad6f39ef3819420386d1327500df95dfa0488162bda1b80d38eb9b910cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ecgdatakit-1.0.0-py3-none-any.whl:

Publisher: publish.yml on UMMISCO/ECGDataKit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page