Skip to main content

Extract EEG signal arrays and seizure metadata from CHB-MIT and EU Epilepsy datasets.

Project description

Seizure EEG Extractor

Extract EEG signal arrays and seizure metadata from seizure EEG datasets into a simple NumPy-based record format.

Supported source datasets:

  • CHB-MIT Scalp EEG Database
  • EU Epilepsy / EPILEPSIAE-style binary dataset folders

The raw datasets are not included in this repository. Download and use them according to their own access, citation, privacy, and data-use terms.

What This Package Does

seizure-eeg-extractor converts each raw recording file into:

  • eeg.npy: a NumPy array with shape (num_samples, num_channels)
  • info.pkl: a Python dictionary with patient ID, file ID, sampling frequency, channel metadata, seizure intervals, timestamps, duration, and output dtype

The package keeps one output folder per patient and one record_<n> folder per source recording. It does not train a seizure detector or create labels beyond the seizure intervals already provided by the source dataset metadata. When a patient is reprocessed, that patient's previous output directory is cleared first so stale records from an earlier run cannot be mixed with the new extraction.

Installation

Use Python 3.10 or newer.

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install seizure-eeg-extractor

For development:

python -m pip install -e ".[dev]"

For plotting examples:

python -m pip install ".[examples]"

Command Line Usage

Process all available CHB-MIT patients:

eeg-extract chbmit /path/to/chbmit/1.0.0

Process selected CHB-MIT patients:

eeg-extract chbmit /path/to/chbmit/1.0.0 \
  --patients chb17 chb20 \
  --output-path ./extracted_data

Process selected EU patients:

eeg-extract eu /path/to/Epilepsiae \
  --patients pat_FR_253 pat_FR_384 \
  --output-path ./extracted_data \
  --dtype float32

Options:

  • --patients, -p: patient IDs separated by spaces or commas.
  • --output-path, -o: output directory. Defaults to <input_path>/extracted_data.
  • --workers, -w: number of patient-processing threads.
  • --dtype: float32 or float64 for eeg.npy. The default is float32, which is normally the practical choice for large datasets.

Python Usage

from seizure_eeg_extractor import CHBMIT, EU

dataset = CHBMIT(
    input_path="/path/to/chbmit/1.0.0",
    output_path="./extracted_chbmit",
    patients_wanted=["chb17", "chb20"],
    output_dtype="float32",
)
dataset.process_patients(max_workers=2)

eu = EU(
    input_path="/path/to/Epilepsiae",
    output_path="./extracted_eu",
    patients_wanted=["pat_FR_253"],
)
eu.process_patients(max_workers=1)

Expected Dataset Layout

CHB-MIT input should contain patient folders named chb01 through chb24. Each patient folder should contain .edf files and the corresponding <patient>-summary.txt; the dataset root may also contain SUBJECT-INFO.

EU input should contain patient folders named like pat_FR_253. The extractor looks recursively inside each patient folder for:

  • .head files containing recording metadata
  • .data files containing binary EEG samples
  • seizurelist.txt containing seizure onset and offset timestamps

Output Layout

extracted_data/
  <patient_id>/
    record_0/
      eeg.npy
      info.pkl
    record_1/
      eeg.npy
      info.pkl

eeg.npy is saved as (samples, channels). info.pkl contains:

Key Description
pid Patient ID, for example chb01 or pat_FR_253.
fid Source recording ID without extension.
fs Sampling frequency in Hz.
num_samples Number of rows in eeg.npy.
num_channels Number of EEG channels.
channel_names Channel names when present in the source file/header.
eeg_dtype Saved NumPy dtype, usually float32.
num_seizures Number of seizure intervals overlapping the record.
seizure_times List of seizure dictionaries with sample indices and, when available, timestamps.
file_start_time Source file start timestamp when available.
file_end_time Source file end timestamp when available.
duration_in_sec Recording duration when available.

Seizure sample indices are record-local. For EU records, seizures that cross a file boundary are clipped to the part that overlaps the current recording.

Processing Method

CHB-MIT processing reads each patient's summary text file, extracts per-file start/end times and seizure start/end seconds, then reads EEG signals from EDF files with pyEDFlib.

EU processing reads each patient's seizurelist.txt, parses absolute seizure timestamps, reads each .head file for sample frequency, channel count, sample count, start timestamp, binary sample width, and conversion factor, then loads the matching .data file with NumPy. Seizures are assigned to every recording whose time interval overlaps the seizure interval.

More details are in docs/processing-methods.md and docs/output-format.md.

Examples

Plot one channel from an extracted record:

python examples/plot_record.py ./extracted_data/chb17/record_0 \
  --channel 0 \
  --samples 1000

See examples/README.md.

Citation

If you use this software in academic work, please cite the repository using the metadata in CITATION.cff.

License

This project is distributed under the MIT License.

Testing

Run the lightweight test suite:

pytest

Integration tests require local raw dataset copies and are skipped by default:

CHBMIT_PATH=/path/to/chbmit/1.0.0 CHBMIT_PATIENT=chb01 pytest -m integration
EU_PATH=/path/to/Epilepsiae EU_PATIENT=pat_FR_253 pytest -m integration

The integration tests write extracted records to pytest-managed temporary directories, not into the raw dataset folders.

During public-release cleanup, the extractor was also exercised against real CHB-MIT and EU dataset samples. Processed records were verified for array shape, dtype, seizure interval bounds, finite sample windows, and cleanup after verification.

Repository Hygiene

Raw EEG files and extracted outputs are intentionally ignored by Git:

  • *.edf
  • *.data
  • *.head
  • *.npy
  • *.pkl
  • data/, datasets/, and extracted_data/

If you work with local data inside the repository tree, inspect git status --ignored before committing changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seizure_eeg_extractor-0.1.1.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seizure_eeg_extractor-0.1.1-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file seizure_eeg_extractor-0.1.1.tar.gz.

File metadata

  • Download URL: seizure_eeg_extractor-0.1.1.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seizure_eeg_extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cb858046c7d15b5114c36afefcef3162dab4d45fdae3fde31e4ac8159a684748
MD5 120ff9342895557432232343ea858a4f
BLAKE2b-256 434c0576f628ef2c5b66a8999a9c62bd498b334054f33ec334971d270da7757e

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_extractor-0.1.1.tar.gz:

Publisher: publish.yml on jamiekoe/seizure-eeg-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seizure_eeg_extractor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for seizure_eeg_extractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ade95197a6b3f027fd839a4b1e4ac215c1f751eccc2406da400203826402f02
MD5 7b6d4c8879b9008cde6fede400549729
BLAKE2b-256 e8b1c2aa4b01be142639e00af97b796d326e65582ce5459aaff56ca0a56013d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for seizure_eeg_extractor-0.1.1-py3-none-any.whl:

Publisher: publish.yml on jamiekoe/seizure-eeg-extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page