Standardised evaluation metrics for epileptic seizure detection and forecasting.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ywatanabe1989

These details have not been verified by PyPI

Project links

Documentation

Project description

SciTeX Seizure Metrics (`scitex-seizure-metrics`)

Unified evaluation library for seizure detection and forecasting — sample-based, alarm-based, and the bridge between them.

Full Documentation · uv pip install scitex-seizure-metrics[all]

Problem and Solution

#	Problem	Solution
1	Cross-paper comparison is broken — Cook 2013 reports time-in-warning, Karoly 2017 reports AUROC + Brier, Maturana 2020 reports AUROC + IoC, Kuhlmann 2018 reports AUROC, Proix 2021 reports IoC + AUC of sensitivity vs proportion-time-in-warning. No two of these can be plotted on the same axis without re-running their methods.	One `MetricsReport` object carries both regimes through one API; `bridge.sample_to_alarm` gives analytic bounds when only one side is reported.
2	Sample- vs alarm-based collapse is documented but untooled — Andrade 2024 showed that 50/56 patients beat chance under sample-based eval but only 6/46 under alarm-based. The community accepts the warning but has no packaged tool to apply both regimes routinely.	`detection.evaluate` + `forecasting.evaluate_stream` through one library; same input, both regimes side-by-side.
3	FP/hr lacks a denominator convention — some papers normalise by total recording time, some by interictal-only time, refractory rules vary or are unstated.	Explicit `AlarmPolicy` required by every alarm-aware function — no silent defaults; every reported number is reproducible.

Comparison with existing tools

Tool	Language	Sample-based	Event-based	Forecasting (SPH/SOP)	IoC vs surrogate	Cross-paper convertor	Status
`timescoring` (SzCORE engine, Dan et al. 2024)	Python	✅	✅	❌	❌	❌	maintained
`szcore-evaluation` (BIDS wrapper)	Python	✅	✅	❌	❌	❌	maintained
`EPILAB` (Direito et al. 2011)	MATLAB	✅	◐	✅	✅	❌	last release 2018
`PySeizure` (2025)	Python	✅	❌	❌	❌	❌	early — focused on detection
`SeizyML` (2024)	Python	✅	✅	❌	❌	❌	detection scope
Andrade et al. 2024 (paper)	—	✅	✅	✅	✅	❌	research code, not a package
scitex-seizure-metrics	Python	✅	✅	✅	✅	✅	this repo

Supported Metrics

Quick definitions for the metrics and policy knobs that recur throughout the README, the docstrings, and the cited papers.

Sample-based metrics

Term	Meaning
AUROC	Area Under the Receiver Operating Characteristic curve. Probability the model ranks a random positive window above a random negative window. Threshold-free; insensitive to class prevalence.
AUPRC	Area Under the Precision–Recall curve. Threshold-free; sensitive to class prevalence — the value to read on heavily-imbalanced seizure data when AUROC looks deceptively high.
Brier	Mean squared error between predicted probability and the 0/1 label. Lower is better. Decomposes into reliability + resolution + uncertainty (`scitex_seizure_metrics.calibration`).
MCC	Matthews Correlation Coefficient. A single balanced summary statistic robust to class imbalance; ranges from −1 (anti-correlation) through 0 (chance) to +1 (perfect).
Balanced accuracy	(Sensitivity + Specificity) / 2. The accuracy you would get if the prevalence were 50/50.
Sensitivity (recall)	Fraction of true seizures detected. Reported at a chosen threshold.
Precision (PPV)	Fraction of detections that were true seizures. Drops fast under low prevalence.
ECE	Expected Calibration Error. Average gap between predicted probability and observed frequency across bins.

Alarm-based metrics

Term	Meaning
Alarm	A single binary "warning is on" event derived from a thresholded probability stream + the `AlarmPolicy`.
FP/hr (false-positive rate per hour)	Number of alarms not followed by a seizure within (SPH, SPH + SOP], normalised by the chosen denominator (`fp_denominator='total'` or `'interictal'`).
IoC	Improvement over Chance. The signed gap between the model's alarm-based sensitivity and the same statistic recomputed under a chance-baseline alarm generator (`scitex_seizure_metrics.surrogates`, default Poisson). Significance is read from a surrogate distribution.
Time-in-warning (TIW, "proportion time in warning")	Fraction of recording time spent inside an active warning window (between alarm onset and refractory end). The natural denominator that pairs with sensitivity in the Proix 2021 operating curve.
Sensitivity vs proportion-time-in-warning	Operating curve introduced by Proix 2021. Plotted instead of sensitivity vs FP/hr when alarm refractory periods make per-hour counts misleading. Same x-axis units as Cook 2013's "time-in-warning" reporting.
Beats chance (alarm)	Boolean — is the model's IoC above the surrogate distribution at the configured significance level? Andrade 2024's headline: 50/56 patients beat chance under sample-based eval but only 6/46 under alarm-based.
Specificity / PPV / NPV / F1 (alarm regime)	Standard confusion-matrix scores on the alarm-vs-prediction-opportunity basis. TP = caught seizures, FN = uncaught seizures, FP = alarms catching nothing; TN = interictal SOP-length "prediction opportunities" with no false alarm (`n_opportunities = floor(interictal_seconds / SOP)`, `TN = max(0, n_opportunities − FP)`, Snyder/Schelter/Mormann tradition). So `specificity = TN/(TN+FP)`, `ppv` (alarm precision) `= TP/(TP+FP)`, `npv = TN/(TN+FN)`, `forecasting_f1 = 2·TP/(2·TP+FP+FN)`. `specificity`/`npv` scale with the SOP-opportunity TN convention (`n_tn` / `n_opportunities` are reported alongside so the denominator is visible); `ppv`/`forecasting_f1` do not depend on TN. Undefined ratios are NaN, never a silent 0.
Observed lead time	Per caught seizure, onset minus the earliest catching alarm (after SPH). Distinct from the SPH constraint: SPH is the minimum required gap, lead time is what the system actually delivered (always SPH ≤ lead ≤ SPH + SOP). `lead_time_mean` / `lead_time_median` summarise the distribution; the per-seizure array is in `extras["lead_times_seconds"]`.

The AlarmPolicy config knobs (SPH · SOP · cadence · refractory · alarm-threshold · FP-denominator) are documented inline on the dataclass and shown in the forecasting example below — they pin alarm-derivation, not metric definitions.

Installation

pip install scitex-seizure-metrics

Demo

from scitex_seizure_metrics import detection, forecasting, AlarmPolicy

# Per-window detection metrics (sensitivity, false-positives/hour, ...)
m = detection.evaluate(y_true=labels, y_pred=preds, fs=256)
print(m.sensitivity, m.fp_per_hour)

# Forecasting metrics (Improvement-over-chance, AUROC, alarm count)
f = forecasting.evaluate(
    alarm_times=alarms, seizure_times=onsets,
    policy=AlarmPolicy(sph_seconds=300, sop_seconds=600,
                       cadence_seconds=60, refractory_seconds=600),
    total_recording_time=24 * 3600,
)
print(f.ioc, f.roc_auc)

graph LR
    Labels["per-window y_true / y_pred"] --> Det["detection.evaluate"]
    Onsets["seizure_times + alarm_times"] --> Fore["forecasting.evaluate"]
    Det --> Out["sensitivity / FP-per-hour / latency"]
    Fore --> Out2["IoC / AUROC / alarm count"]

Quick Start

from scitex_seizure_metrics import detection, forecasting, AlarmPolicy

# Detection — per-window classification
rep = detection.evaluate(y_true, y_proba, threshold=0.5, fs=1)
print(rep.roc_auc, rep.pr_auc, rep.brier, rep.mcc)

# Forecasting — continuous stream with explicit alarm policy
policy = AlarmPolicy(
    sph_seconds=300, sop_seconds=600, cadence_seconds=60,
    refractory_seconds=600, alarm_threshold=0.5,
    fp_denominator="interictal",   # Mormann tradition
)
rep = forecasting.evaluate_stream(
    proba, times, seizures, policy,
    total_recording_time=24 * 3600,
)
print(rep.sensitivity, rep.fp_per_hour, rep.ioc, rep.time_in_warning_frac)

See examples/01_detection_quick_start.ipynb, examples/02_forecasting_quick_start.ipynb, and the other notebooks under examples/ for end-to-end workflows.

Architecture

flowchart LR
    Probs["per-window proba<br/>+ ground truth"] --> Det["detection.evaluate"]
    Probs --> StreamIn["forecasting.evaluate_stream"]
    Policy["AlarmPolicy<br/>SPH · SOP · cadence · refractory · FP denom"] --> StreamIn
    Det --> RepDet["MetricsReport<br/>AUROC · AUPRC · Brier · MCC"]
    StreamIn --> RepFc["MetricsReport<br/>sensitivity · FP/hr · IoC · TIW"]
    RepDet -.->|"bridge analytic bounds"| RepFc
    RepFc --> Plots["plots: sensitivity vs FP/hr,<br/>IoC vs surrogate, cadence ablation"]

The split mirrors how the seizure-evaluation literature itself is organised — sample-based vs alarm-based vs the bridge — so a paper-faithful re-implementation lives in exactly one place. MetricsReport is the single object that travels between regimes; AlarmPolicy is the single object that pins every reproducibility decision an alarm-based metric requires.

6 Interfaces

scitex_seizure_metrics.forecasting — alarm-based metrics with explicit AlarmPolicy (primary)

from scitex_seizure_metrics import AlarmPolicy, forecasting

policy = AlarmPolicy(
    sph_seconds=300, sop_seconds=600, cadence_seconds=60,
    refractory_seconds=600, alarm_threshold=0.5,
    fp_denominator="interictal",
)
rep = forecasting.evaluate_stream(
    proba, times, seizures, policy,
    total_recording_time=24 * 3600, n_surrogate=1000,
)
print(rep.sensitivity, rep.fp_per_hour, rep.ioc, rep.time_in_warning_frac)
# Alarm-regime confusion metrics + observed lead time
print(rep.specificity, rep.ppv, rep.npv, rep.forecasting_f1)
print(rep.lead_time_mean, rep.lead_time_median,
      rep.extras["lead_times_seconds"])

# Operating curve across thresholds
df = forecasting.sweep_thresholds(proba, times, seizures, policy)

# Cadence ablation
policies = [AlarmPolicy(..., cadence_seconds=c) for c in [30, 60, 120, 300]]
df = forecasting.sweep_policies(proba, times, seizures, policies)

scitex_seizure_metrics.detection — sample-based metrics (AUROC, AUPRC, Brier, MCC, ...)

from scitex_seizure_metrics import detection
rep = detection.evaluate(y_true, y_proba, threshold=0.5, fs=1)
print(rep.roc_auc, rep.pr_auc, rep.brier, rep.mcc, rep.balanced_accuracy)

scitex_seizure_metrics.bridge — sample↔alarm analytic bounds for cross-paper comparison

from scitex_seizure_metrics import bridge

bnd = bridge.sample_to_alarm(
    sample_sensitivity=0.79, sample_specificity=0.85,
    sop_seconds=600, cadence_seconds=60, refractory_seconds=600,
)
print(bnd.alarm_sensitivity_upper, bnd.fp_per_hour_upper)

scitex_seizure_metrics.sensitivity_tiw — empirical sensitivity vs time-in-warning trade-off (Karoly 2017 Fig 6)

The empirical complement to the analytic bridge: sweep the decision threshold and trace seizure-level sensitivity against time-in-warning, the field-standard forecasting view. Chance is the diagonal (sensitivity == time-in-warning); a forecaster carries signal only above it.

from scitex_seizure_metrics import AlarmPolicy, plots, sensitivity_tiw

policy = AlarmPolicy(sph_seconds=0, sop_seconds=600,
                     cadence_seconds=60, refractory_seconds=600)

curve = sensitivity_tiw.sensitivity_tiw_curve(
    scores, policy, seizure_times=onsets, times=times, target_tiw=0.20,
)
print(curve.improvement_over_chance,        # AUC-like area above the diagonal
      curve.sensitivity_at_target_tiw,      # sensitivity at 20 % time-in-warning
      curve.tiw_at_target_sensitivity)      # time-in-warning at 75 % sensitivity

# Is the operating point above a time-matched coin?
sig = sensitivity_tiw.surrogate_above_chance(
    scores, policy, threshold=0.5, seizure_times=onsets, times=times,
)
print(sig.p_value, sig.ci_low, sig.ci_high)

plots.sensitivity_tiw([curve], save_path="fig_sens_tiw")  # png + pdf

See docs/math/sensitivity_tiw.md for the chance-diagonal derivation and a worked example.

scitex_seizure_metrics.papers — paper-replica shims (Karoly 2017, Maturana 2020, Kuhlmann 2018, Andrade 2024)

from scitex_seizure_metrics.papers import andrade2024
out = andrade2024.metrics(
    y_true=labels, y_proba=preds,
    times_seconds=times, seizure_times=onsets,
)
print(out["sample_auroc"], out["alarm_sensitivity"], out["beats_chance_alarm"])
# Reproduces the side-by-side sample-vs-alarm panel from the paper.

Available shims: karoly2017, maturana2020, kuhlmann2018, andrade2024. Each metrics(...) returns a dict in the paper's preferred metric set.

scitex_seizure_metrics.calibration — Brier decomposition + reliability diagram

from scitex_seizure_metrics import calibration, plots
cal = calibration.calibration_report(y_true, y_proba, n_bins=10)
print(cal.brier, cal.reliability, cal.resolution, cal.uncertainty,
      cal.expected_calibration_error)
plots.reliability_diagram(cal)

scitex_seizure_metrics.plots — relationships between metrics

from scitex_seizure_metrics import plots
plots.sensitivity_vs_fp_per_hour(sweep_df)        # operating curve
plots.sensitivity_tiw([curve])                    # sensitivity vs time-in-warning (Karoly 2017 Fig 6)
plots.ioc_vs_surrogate(sweep_df)                  # model vs chance
plots.cadence_ablation(policy_sweep_df)           # FP/hr vs cadence
plots.sample_vs_alarm_scatter(per_patient_df)     # the Andrade 2024 figure
plots.metric_correlation_heatmap(per_patient_df)  # redundancy diagnostic

Empirical validation of the sample↔alarm bridge

The analytic bridge is validated by Monte Carlo. For each setting we synthesise a long per-window stream with a known per-window sensitivity s and specificity 1 − α plus seizures, run the AlarmPolicy, and measure the empirical alarm-sensitivity and FP/hr. We check (i) the empirical values land inside the analytic sample_to_alarm [lower, upper] bands, and (ii) the reverse alarm_to_sample recovers the true per-window s and specificity. Each seizure's SOP holds K = ceil(SOP / cadence) windows by construction, so the per-seizure detection bound is 1 − (1 − s)^K independent of prevalence — the soundness fix that replaced an earlier prevalence-shrunk K_eff which collapsed the upper bound to s at realistic low prevalence (empirical ≈ 1.0 vs that bound 0.5 → violated).

Empirical validation of the sample↔alarm bridge

s	specificity	prevalence	K	empirical alarm-sens	alarm-sens band	empirical FP/hr	FP/hr band	sens	FP/hr	reverse s	reverse spec
0.50	0.90	0.05	10	0.950	[0.50, 1.00]	3.010	[0.00, 5.70]	PASS	PASS	PASS	PASS
0.30	0.95	0.02	30	0.942	[0.30, 1.00]	2.382	[0.00, 4.00]	PASS	PASS	PASS	PASS
0.70	0.85	0.10	5	0.850	[0.70, 1.00]	4.930	[0.00, 8.10]	PASS	PASS	PASS	PASS
0.60	0.99	0.01	60	0.992	[0.60, 1.00]	0.748	[0.00, 1.19]	PASS	PASS	PASS	PASS

All four settings pass in both directions. Reproduce with python examples/06_bridge_validation.py (writes the figure to docs/bridge_validation.{png,pdf} and the table to examples/06_bridge_validation_out/); CI guards it via tests/examples/test_06_bridge_validation.py.

References

Andrade I, Teixeira C, Pinto M (2024). On the performance of seizure prediction machine learning methods across different databases: the sample and alarm-based perspectives. Frontiers in Neuroscience. doi:10.3389/fnins.2024.1417748.
Cook MJ et al. (2013). Lancet Neurology. doi:10.1016/S1474-4422(13)70075-9.
Dan J et al. (2024). SzCORE. Epilepsia. doi:10.1111/epi.18113.
Direito B et al. (2011). EPILAB. J Neurosci Methods. doi:10.1016/j.jneumeth.2011.06.022.
Karoly PJ et al. (2017). Brain. doi:10.1093/brain/awx173.
Kuhlmann L et al. (2018). Brain. doi:10.1093/brain/awy210.
Maturana MI et al. (2020). Nature Communications. doi:10.1038/s41467-020-15908-3.
Mormann F et al. (2007). Seizure prediction: the long and winding road. Brain. doi:10.1093/brain/awl241.
Schulze-Bonhage A et al. (2020). Performance Metrics for Online Seizure Prediction. PMC7340210.

Part of SciTeX

scitex-seizure-metrics is part of SciTeX. Install via the umbrella with pip install scitex[seizure-metrics] to use as scitex.seizure_metrics (the seizure-evaluation surface re-exported from this peer; equivalent to scitex-ml[seizure] / scitex_ml.metrics.seizure for users who only want this slice without the rest of scitex-ml).

Four Freedoms for Research

The freedom to run your research anywhere — your machine, your terms.

The freedom to study how every step works — from raw data to final manuscript.

The freedom to redistribute your workflows, not just your papers.

The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ywatanabe1989

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.2.0

Jun 29, 2026

0.1.3

May 24, 2026

0.1.1

May 11, 2026

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_seizure_metrics-0.2.0.tar.gz (2.6 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scitex_seizure_metrics-0.2.0-py3-none-any.whl (2.0 MB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file scitex_seizure_metrics-0.2.0.tar.gz.

File metadata

Download URL: scitex_seizure_metrics-0.2.0.tar.gz
Upload date: Jun 29, 2026
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for scitex_seizure_metrics-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8dfa7da0a0687481abbd10a70a597620798d90e0be52b1abf6e078a18c5ac250`
MD5	`44426f5fa047c88e67e8e60f7cf448f4`
BLAKE2b-256	`58c35bab4e411bb33a54e499855ce9d1ec490ef5333c0770f3cd615893758af5`

See more details on using hashes here.

File details

Details for the file scitex_seizure_metrics-0.2.0-py3-none-any.whl.

File metadata

Download URL: scitex_seizure_metrics-0.2.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 2.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for scitex_seizure_metrics-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4b8c674519a4f37bc345df6119df9b792e364ed3ac5c176be8bc3500d6159cc`
MD5	`ec90527dcf4db84af95e82882800e191`
BLAKE2b-256	`c4c0494f4e596ef82a42bd67081e20d9579e6632fe6b574e0c29c8ece26ce482`

See more details on using hashes here.

scitex-seizure-metrics 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

SciTeX Seizure Metrics (scitex-seizure-metrics)

Problem and Solution

Supported Metrics

Installation

Demo

Quick Start

Architecture

6 Interfaces

Empirical validation of the sample↔alarm bridge

References

Part of SciTeX

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

SciTeX Seizure Metrics (`scitex-seizure-metrics`)