Skip to main content

Prepare detector features and run seizure EEG cross-validation on CHB-MIT and EU extracted records.

Project description

Seizure EEG Detector

Prepare detector-ready EEG features and run seizure detection cross-validation on extracted CHB-MIT and EU Epilepsy records.

This package is the companion detector pipeline for seizure-eeg-extractor. Use the extractor first to convert raw dataset files into eeg.npy and info.pkl record folders. This package then computes detector feature arrays, creates seizure/interictal arrays, and trains simple baseline classifiers.

Two temporal feature encodings are implemented. Choose one explicitly with --feature-method when preparing features:

  • Energy-decay-memory (--feature-method edm): O'Leary, G., Groppe, D. M., Valiante, T. A., Verma, N., and Genov, R. (2018). "NURIP: Neural Interface Processor for Brain-State Classification and Programmable-Waveform Neurostimulation." IEEE Journal of Solid-State Circuits, 53(11), 3150-3162. https://doi.org/10.1109/JSSC.2018.2869579
  • Windowed spectral features (--feature-method windowed): Shoeb, A. H. (2009). "Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment." PhD thesis, Massachusetts Institute of Technology.

The raw EEG datasets are not included. Download and use CHB-MIT and EU Epilepsy/EPILEPSIAE data according to their own access, citation, privacy, and data-use terms.

Installation

Use Python 3.10 or newer.

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install seizure-eeg-detector

For local development:

python -m pip install -e ".[dev]"

Expected Input

seizure-eeg-detector expects the NumPy record layout produced by seizure-eeg-extractor, which is also available on PyPI:

extracted_data/
  <patient_id>/
    record_0/
      eeg.npy
      info.pkl
    record_1/
      eeg.npy
      info.pkl

Each info.pkl must include fs, channel_names, num_seizures, and seizure_times. Seizure intervals use record-local sample indices: onset_index and offset_index.

Command Line Usage

Prepare detector feature arrays for selected CHB-MIT patients:

eeg-detect prepare chbmit /path/to/extracted_data \
  --patients chb01 chb02 \
  --feature-method edm \
  --downsample-factor 256

Prepare windowed spectral features:

eeg-detect prepare chbmit /path/to/extracted_data \
  --patients chb01 chb02 \
  --feature-method windowed \
  --window-seconds 2 \
  --window-count 3 \
  --downsample-factor 256

Prepare selected EU patients using manually chosen channels:

eeg-detect prepare eu /path/to/extracted_data \
  --patients pat_FR_548 pat_FR_1096 pat_FR_1125 \
  --channels HL1 HL2 HL3 HL4 HL5 HL6 HL7 HL8 \
  --channel-mode within-patient \
  --feature-method edm \
  --downsample-factor 1024

--downsample-factor keeps every Nth prepared feature sample when writing seizure_<n>.npy and interictal.npy. With EDM features, the raw EEG is filtered and the EDM state is updated at the original sampling rate before this output downsampling. For example, with 256 Hz CHB-MIT data, --downsample-factor 256 creates one detector feature row per second (effective_fs = 1 Hz).

During preparation, the CLI reports the selected channels, whether it had to fall back to the largest compatible record subset, whether --channel-mode across-patients removed patient-available channels, and any records skipped because they do not contain the selected channel set.

Run cross-validation after preparation:

eeg-detect cross-validate chbmit /path/to/extracted_data /path/to/results \
  --patients chb01 \
  --model lgbm \
  --num-trees 1024 \
  --threshold 0.5 \
  --balance-training undersample

Apply additional thresholds to saved raw prediction scores without retraining:

eeg-detect threshold-sweep /path/to/results --thresholds 0.1 0.3 0.5

Summarize patient-level and overall results:

eeg-detect summarize /path/to/results

This writes:

  • summary/summary.json
  • summary/patient_summary.csv
  • summary/overall_summary.csv

The summary includes the decision threshold, detected seizures, missed seizures, sensitivity, latency statistics for detected seizures, total false positives, and FPR/hour. Overall rows use pooled_* for metrics computed after combining all patients, and mean_patient_* for unweighted means of patient-level metrics.

Supported models are lgbm, svm, and adaboost. The CLI defaults to LightGBM on CPU for portability; pass --lgbm-device gpu only when your local LightGBM build supports GPU training.

By default, cross-validation trains on every prepared seizure and interictal feature row. Pass --balance-training undersample to randomly downsample the majority class inside each training fold so the model sees equal numbers of seizure and interictal rows. This does not change the held-out test records or the reported metrics. Balanced result directories include the balance mode in the path, for example lgbm/trees_1024/balance_undersample/threshold_0.5.

Python Usage

from seizure_eeg_detector import (
    CHBPatient,
    CVTraining,
    DataPrepper,
    FeatureExtractor,
    ModelType,
    TrainingBalanceMode,
)

input_path = "/path/to/extracted_data"
patient = CHBPatient("chb01", input_path)

feature_extractor = FeatureExtractor(
    bands=[
        (0.5, 3.5),
        (3.5, 6.5),
        (6.5, 9.5),
        (9.5, 12.5),
        (12.5, 15.5),
        (15.5, 18.5),
        (18.5, 21.5),
        (21.5, 24.5),
    ],
    alphas=[7, 9, 10, 11, 12, 16],
    feature_method="edm",
)
dataprepper = DataPrepper(feature_extractor, downsample_factor=256)
channels = dataprepper.select_channels(patient.record_paths)
dataprepper.prep_data(patient.record_paths, channels)

trainer = CVTraining(
    "/path/to/results",
    threshold=0.5,
    balance_training=TrainingBalanceMode.UNDERSAMPLE,
)
trainer.run_cv(patient, ModelType.LGBM, params={"objective": "binary"}, num_trees=1024)
from seizure_eeg_detector import summarize_results, write_summary_outputs

summary = summarize_results("/path/to/results")
write_summary_outputs(summary, "/path/to/results/summary")
from seizure_eeg_detector import sweep_thresholds

sweep_thresholds("/path/to/results", [0.1, 0.3, 0.5])

Processing Method

Feature preparation always starts by computing spectral features:

  • Spectral energy: selected EEG channels are filtered into configurable frequency bands with FIR bandpass filters, then absolute amplitudes are used as per-band features.

The temporal feature encoding must be selected with --feature-method:

  • edm: each spectral-energy feature is expanded with a set of exponential decay traces controlled by alphas, following the EDM feature approach described by O'Leary et al. (2018).
  • windowed: spectral-energy features are averaged over --window-seconds epochs, then the most recent --window-count epoch vectors are concatenated. This follows the temporal feature-vector design described by Shoeb (2009), where L = 2 seconds and W = 3 are the usual settings.

Prepared files are saved into each record directory:

  • seizure_<n>.npy for each labeled seizure interval.
  • interictal.npy for records with no labeled seizures.
  • Optional se.npy files when --save-se is passed, and optional edm.npy files for EDM runs when --save-edm is passed.

Cross-validation leaves out one seizure record or one interictal record at a time, trains on the remaining prepared arrays, writes metrics, and saves raw and thresholded predictions under the results directory. The default decision threshold is 0.5, and result directories include it in the run path, for example lgbm/trees_1024/threshold_0.5. If training balancing is enabled, only the training fold is resampled; held-out seizure and interictal arrays are evaluated unchanged. The summarizer treats latency: nan as a missed seizure when computing sensitivity. Detected latency statistics exclude missed seizures. The fpr metric reported by this package is false positives per hour, counted from positive prediction samples rather than event-collapsed false alarms.

Threshold sweeps reuse the saved preds_*.txt raw score files and write sibling threshold_<value> result directories with recomputed metrics and binary predictions. Cross-validation saves the effective prediction sampling rate in run_config.json, and prediction files must contain complete raw score arrays because omitted scores cannot be recovered during threshold sweeps.

Citation

If you use this package in academic work, please cite the detector software:

@software{koerner2026seizure_eeg_detector,
  author = {Koerner, Jamie},
  title = {seizure-eeg-detector},
  year = {2026},
  url = {https://github.com/jamiekoe/seizure-eeg-detector}
}

The repository also includes CITATION.cff for citation managers and GitHub's citation UI.

License

This project is distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seizure_eeg_detector-0.1.0.tar.gz (37.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seizure_eeg_detector-0.1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file seizure_eeg_detector-0.1.0.tar.gz.

File metadata

  • Download URL: seizure_eeg_detector-0.1.0.tar.gz
  • Upload date:
  • Size: 37.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seizure_eeg_detector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0e9ac4c6d1bc6f186aa8f5e0e29cf6065a24a3eb9e7f1a05162f51a160f626e
MD5 7015bc6b99597ae57c352b5dcacbb640
BLAKE2b-256 c8cddc804e50ce14a585113e287ca5d7121dfcaeb0165e757590ff95f9e5dc16

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_detector-0.1.0.tar.gz:

Publisher: publish.yml on jamiekoe/seizure-eeg-detector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seizure_eeg_detector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for seizure_eeg_detector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4c1c1d8a1a4c9f59cd7548f8062ba4c87afe8976ab7801d08b1d6cb0501d19c
MD5 b4ab6b116078d04c569e42468ce2ebab
BLAKE2b-256 986a110b251ada528c6da83403b1d16a8e05dc23ad33b9978e819264a0d3acd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_detector-0.1.0-py3-none-any.whl:

Publisher: publish.yml on jamiekoe/seizure-eeg-detector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page