Skip to main content

Extract EEG signal arrays and seizure metadata from CHB-MIT and EU Epilepsy datasets.

Project description

Seizure EEG Extractor

Extract EEG signal arrays and seizure metadata from seizure EEG datasets into a simple NumPy-based record format.

Supported source datasets:

  • CHB-MIT Scalp EEG Database
  • EU Epilepsy / EPILEPSIAE-style binary dataset folders

The raw datasets are not included in this repository. Download and use them according to their own access, citation, privacy, and data-use terms.

What This Package Does

seizure-eeg-extractor converts each raw recording file into:

  • eeg.npy: a NumPy array with shape (num_samples, num_channels)
  • info.pkl: a Python dictionary with patient ID, file ID, sampling frequency, channel metadata, seizure intervals, timestamps, duration, and output dtype

The package keeps one output folder per patient and one record_<n> folder per source recording. It does not train a seizure detector or create labels beyond the seizure intervals already provided by the source dataset metadata. When a patient is reprocessed, that patient's previous output directory is cleared first so stale records from an earlier run cannot be mixed with the new extraction.

Installation

Use Python 3.10 or newer.

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install seizure-eeg-extractor

For plotting examples:

python -m pip install "seizure-eeg-extractor[examples]"

Command Line Usage

Process all available CHB-MIT patients:

eeg-extract chbmit /path/to/chbmit/1.0.0

Process selected CHB-MIT patients:

eeg-extract chbmit /path/to/chbmit/1.0.0 \
  --patients chb17 chb20 \
  --output-path ./extracted_data

Process selected EU patients:

eeg-extract eu /path/to/Epilepsiae \
  --patients pat_FR_253 pat_FR_384 \
  --output-path ./extracted_data \
  --dtype float32

Options:

  • --patients, -p: patient IDs separated by spaces or commas.
  • --output-path, -o: output directory. Defaults to <input_path>/extracted_data.
  • --workers, -w: number of patient-processing threads.
  • --dtype: float32 or float64 for eeg.npy. The default is float32, which is normally the practical choice for large datasets.

Python Usage

from seizure_eeg_extractor import CHBMIT, EU

dataset = CHBMIT(
    input_path="/path/to/chbmit/1.0.0",
    output_path="./extracted_chbmit",
    patients_wanted=["chb17", "chb20"],
    output_dtype="float32",
)
dataset.process_patients(max_workers=2)

eu = EU(
    input_path="/path/to/Epilepsiae",
    output_path="./extracted_eu",
    patients_wanted=["pat_FR_253"],
)
eu.process_patients(max_workers=1)

Expected Dataset Layout

CHB-MIT input should contain patient folders named chb01 through chb24. Each patient folder should contain .edf files and the corresponding <patient>-summary.txt; the dataset root may also contain SUBJECT-INFO.

EU input should contain patient folders named like pat_FR_253. The extractor looks recursively inside each patient folder for:

  • .head files containing recording metadata
  • .data files containing binary EEG samples
  • seizurelist.txt containing seizure onset and offset timestamps

Output Layout

extracted_data/
  <patient_id>/
    record_0/
      eeg.npy
      info.pkl
    record_1/
      eeg.npy
      info.pkl

eeg.npy is saved as (samples, channels). info.pkl contains:

Key Description
pid Patient ID, for example chb01 or pat_FR_253.
fid Source recording ID without extension.
fs Sampling frequency in Hz.
num_samples Number of rows in eeg.npy.
num_channels Number of EEG channels.
channel_names Channel names when present in the source file/header.
eeg_dtype Saved NumPy dtype, usually float32.
num_seizures Number of seizure intervals overlapping the record.
seizure_times List of seizure dictionaries with sample indices and, when available, timestamps.
file_start_time Source file start timestamp when available.
file_end_time Source file end timestamp when available.
duration_in_sec Recording duration when available.

Seizure sample indices are record-local. For EU records, seizures that cross a file boundary are clipped to the part that overlaps the current recording.

Processing Method

CHB-MIT processing reads each patient's summary text file, extracts per-file start/end times and seizure start/end seconds, then reads EEG signals from EDF files with pyEDFlib.

EU processing reads each patient's seizurelist.txt, parses absolute seizure timestamps, reads each .head file for sample frequency, channel count, sample count, start timestamp, binary sample width, and conversion factor, then loads the matching .data file with NumPy. Seizures are assigned to every recording whose time interval overlaps the seizure interval.

More details are in the processing method notes and output format notes.

Examples

Plot one channel from an extracted record:

python examples/plot_record.py ./extracted_data/chb17/record_0 \
  --channel 0 \
  --samples 1000

See the examples README.

Citation

If you use this software in academic work, please cite the repository using the metadata in CITATION.cff.

License

This project is distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seizure_eeg_extractor-0.1.3.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seizure_eeg_extractor-0.1.3-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file seizure_eeg_extractor-0.1.3.tar.gz.

File metadata

  • Download URL: seizure_eeg_extractor-0.1.3.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seizure_eeg_extractor-0.1.3.tar.gz
Algorithm Hash digest
SHA256 14d395c1d742f678eff1ab47c912a8ffb167dc72feb53ab909f1897e634ea10a
MD5 11605d31448d096c9235364f2bdd3099
BLAKE2b-256 4d34bb5f8a79115254f2c6db1267401ce5eb6c337e25539c1202c60f5c3a25ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_extractor-0.1.3.tar.gz:

Publisher: publish.yml on jamiekoe/seizure-eeg-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seizure_eeg_extractor-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for seizure_eeg_extractor-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b4fafcb60ce7c8a7f7a58c12e8d065b5c813b0506f8e659a13b6c22a167df9b1
MD5 b412a9dc2cd25f1141bb6f7943cde9a8
BLAKE2b-256 f587326efabe032053b7977d2e2fc14d972cf89863b55043f5325132af30dc7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_extractor-0.1.3-py3-none-any.whl:

Publisher: publish.yml on jamiekoe/seizure-eeg-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page