Multi-format ECG file parsing and signal processing library
Project description
ECGDataKit
A Python library for parsing, processing, and visualizing multi-format ECG files.
Developed at UMMISCO / IRD by Ahmad Fall.
ecgdatakit.ummisco.fr — Full documentation, API reference, and getting started guide.
Features
Parsing — 12 ECG formats, one unified data model
| Format | File Types | Detection |
|---|---|---|
| HL7 aECG | .xml |
<AnnotatedECG in header |
| Philips Sierra XML | .xml |
<restingecgdata in header |
| ISHNE Holter | .ecg, .hol |
ISHNE1.0 or ANN 1.0 magic bytes |
| Mortara EL250 | .xml |
<ECG + <CHANNEL in header |
| EDF/EDF+ | .edf |
"0 " at offset 0 |
| SCP-ECG | .scp |
Valid Section 0 pointer table at offset 6 |
| GE MUSE XML | .xml |
<RestingECG> in header |
| DICOM Waveform | .dcm |
DICM at offset 128 |
| WFDB (PhysioNet) | .hea + .dat |
.hea extension + valid header |
| MFER | .mwf, .mfer |
Valid MFER tag + BER length |
| Mindray BeneHeart R12 | .xml |
<BeneHeartR12> or <MindrayECG> |
| GE MAC 2000 | .xml |
<MAC2000> or <GE_MAC> |
Signal Processing
| Category | Capabilities |
|---|---|
| Filtering | Butterworth (lowpass, highpass, bandpass, notch), baseline removal, diagnostic & monitoring presets |
| Peak Detection | Pan-Tompkins, Shannon energy |
| Heart Rate | Average HR, RR intervals, instantaneous beat-by-beat HR |
| HRV Analysis | Time-domain (SDNN, RMSSD, pNN50), frequency-domain (VLF/LF/HF), Poincaré (SD1/SD2) |
| Spectral | FFT, Welch PSD, beat segmentation, ensemble averaging |
| Quality | Signal quality index (SQI), SNR estimation |
| Leads | Derive III, aVR/aVL/aVF, full 12-lead assembly |
| Cleaning | Built-in, BioSPPy, NeuroKit2, combined pipelines |
| Deep Denoising | DeepFADE — a DenseNet encoder-decoder denoising autoencoder trained on a large private ECG database (weights bundled) |
Visualization
| Type | Plots |
|---|---|
| ECG Waveforms | Single lead, multi-lead, standard 12-lead grid with paper background |
| Annotations | R-peak markers, RR intervals, heart rate overlay |
| Beat Analysis | Segmented beats, ensemble-averaged beat with SD shading |
| Spectral | Power spectrum (PSD/FFT), spectrogram |
| HRV | Tachogram, Poincaré plot, frequency bands, metrics dashboard |
| Reports | Signal quality per lead, full ECG report with patient info |
| Interactive | All plots available as interactive Plotly versions (zoom, pan, hover) |
Installation
# Core (parsing only)
pip install ecgdatakit
# With signal processing
pip install "ecgdatakit[processing]"
# With static plots (matplotlib)
pip install "ecgdatakit[plotting]"
# With interactive plots (plotly)
pip install "ecgdatakit[plotting-interactive]"
# With ECG cleaning backends
pip install "ecgdatakit[cleaning]"
# With DeepFADE denoising autoencoder (requires torch)
pip install "ecgdatakit[denoising]"
# Everything (except torch — install separately if needed)
pip install "ecgdatakit[all]"
Optional extras for specific formats:
pip install "ecgdatakit[holter]" # ISHNE Holter CRC validation
pip install "ecgdatakit[dicom]" # DICOM waveform support
Quick Start
Parse an ECG file
from ecgdatakit import FileParser
record = FileParser().parse("path/to/ecg_file.xml")
print(record.source_format) # "sierra_xml"
print(record.patient.first_name) # "John"
print(record.patient.age) # 55
print(record.recording.acquisition.signal.sampling_rate) # 500
print(record.measurements.heart_rate) # 75
print(record.device.manufacturer) # "Philips"
print(record.signal.data_encoding) # "base64"
print(len(record.leads)) # 12
json_str = record.to_json()
Process signals
from ecgdatakit.processing import (
diagnostic_filter, detect_r_peaks, heart_rate,
rr_intervals, time_domain, signal_quality_index, clean_ecg,
)
lead = record.leads[1]
filtered = diagnostic_filter(lead)
peaks = detect_r_peaks(filtered)
peaks_se = detect_r_peaks(filtered, method="shannon_energy")
hr = heart_rate(filtered, peaks)
rr = rr_intervals(filtered, peaks)
hrv = time_domain(rr)
print(hrv["sdnn"], hrv["rmssd"], hrv["pnn50"])
sqi = signal_quality_index(lead)
cleaned = clean_ecg(lead)
cleaned = clean_ecg(lead, method="neurokit2")
cleaned = clean_ecg(lead, method="deepfade")
Visualize
from ecgdatakit.plotting import (
plot_lead, plot_12lead, plot_peaks, plot_hrv_summary,
iplot_lead, iplot_12lead,
)
# Static plots auto-display by default
plot_12lead(record)
plot_peaks(filtered, peaks)
plot_hrv_summary(rr)
# To get the figure without displaying (e.g. for saving):
fig = plot_12lead(record, show=False)
fig.savefig("ecg_12lead.png", dpi=150)
# Use sample indices instead of time on the x-axis:
plot_lead(filtered, x_axis="samples")
# Interactive plots (plotly) — opens in browser
iplot_lead(filtered, peaks).show()
iplot_12lead(record).show()
Batch processing
from pathlib import Path
from ecgdatakit import parse_batch
files = list(Path("ecg_data/").glob("*.xml"))
for record in parse_batch(files, max_workers=4):
print(record.patient.patient_id, record.measurements.heart_rate)
Data Model
All parsers produce the same ECGRecord:
ECGRecord
patient: PatientInfo # ID, name, birth date, sex, age, weight, height, medications
recording: RecordingInfo # date, duration, sample rate, ADC gain, technician, physician
device: DeviceInfo # manufacturer, model, name, serial number, software version
filters: FilterSettings # highpass, lowpass, notch frequencies
signal: SignalCharacteristics # bits/sample, encoding, compression, channel counts
leads: list[Lead] # label, samples (float64 array), sample rate, units
interpretation: Interpretation # statements, severity, source, interpreter
measurements: GlobalMeasurements # HR, PR, QRS, QT, QTc, axes, RR interval
median_beats: list[Lead] # median/template beats if available
annotations: dict[str, str] # additional key-value annotations
source_format: str # parser identifier
raw_metadata: dict # original format-specific metadata
Exceptions
All exceptions inherit from ECGDataKitError:
| Exception | When raised |
|---|---|
UnsupportedFormatError |
File format not recognized |
CorruptedFileError |
File is truncated or structurally invalid |
MissingElementError |
Required element or field is missing |
ChecksumError |
Checksum validation failed |
Testing
pip install -e ".[all,dev,holter,dicom]"
pytest tests/ -v
Author
License
Apache 2.0 — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ecgdatakit-1.0.0.tar.gz.
File metadata
- Download URL: ecgdatakit-1.0.0.tar.gz
- Upload date:
- Size: 48.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
516302e683241227f2726bd770733af22fb6829aabfa53c500b24f69b50b9f4b
|
|
| MD5 |
57879e6e235f36834880c6bf7cd96ac0
|
|
| BLAKE2b-256 |
7af7f2e036cd07b454a71afcac864173a542f678051d2bbe83e6a49e59b2b15c
|
Provenance
The following attestation bundles were made for ecgdatakit-1.0.0.tar.gz:
Publisher:
publish.yml on UMMISCO/ECGDataKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecgdatakit-1.0.0.tar.gz -
Subject digest:
516302e683241227f2726bd770733af22fb6829aabfa53c500b24f69b50b9f4b - Sigstore transparency entry: 1133967494
- Sigstore integration time:
-
Permalink:
UMMISCO/ECGDataKit@24e8f5691d64cc7493476af4109579185fe84796 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/UMMISCO
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24e8f5691d64cc7493476af4109579185fe84796 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ecgdatakit-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ecgdatakit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 48.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62f35ff3afba405a9c33fe5a65584d0566ce88c34b5cb4b1c92a1e33c6b2bfbc
|
|
| MD5 |
c83f9b649d0cb1146744b73e03b1252b
|
|
| BLAKE2b-256 |
aefaaad6f39ef3819420386d1327500df95dfa0488162bda1b80d38eb9b910cd
|
Provenance
The following attestation bundles were made for ecgdatakit-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on UMMISCO/ECGDataKit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ecgdatakit-1.0.0-py3-none-any.whl -
Subject digest:
62f35ff3afba405a9c33fe5a65584d0566ce88c34b5cb4b1c92a1e33c6b2bfbc - Sigstore transparency entry: 1133967589
- Sigstore integration time:
-
Permalink:
UMMISCO/ECGDataKit@24e8f5691d64cc7493476af4109579185fe84796 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/UMMISCO
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@24e8f5691d64cc7493476af4109579185fe84796 -
Trigger Event:
release
-
Statement type: