Extract EEG signal arrays and seizure metadata from CHB-MIT and EU Epilepsy datasets.
Project description
Seizure EEG Extractor
Extract EEG signal arrays and seizure metadata from seizure EEG datasets into a simple NumPy-based record format.
Supported source datasets:
- CHB-MIT Scalp EEG Database
- EU Epilepsy / EPILEPSIAE-style binary dataset folders
The raw datasets are not included in this repository. Download and use them according to their own access, citation, privacy, and data-use terms.
What This Package Does
seizure-eeg-extractor converts each raw recording file into:
eeg.npy: a NumPy array with shape(num_samples, num_channels)info.pkl: a Python dictionary with patient ID, file ID, sampling frequency, channel metadata, seizure intervals, timestamps, duration, and output dtype
The package keeps one output folder per patient and one record_<n> folder per
source recording. It does not train a seizure detector or create labels beyond
the seizure intervals already provided by the source dataset metadata.
When a patient is reprocessed, that patient's previous output directory is
cleared first so stale records from an earlier run cannot be mixed with the new
extraction.
Installation
Use Python 3.10 or newer.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install .
For development:
python -m pip install -e ".[dev]"
For plotting examples:
python -m pip install ".[examples]"
Command Line Usage
Process all available CHB-MIT patients:
eeg-extract chbmit /path/to/chbmit/1.0.0
Process selected CHB-MIT patients:
eeg-extract chbmit /path/to/chbmit/1.0.0 \
--patients chb17 chb20 \
--output-path ./extracted_data
Process selected EU patients:
eeg-extract eu /path/to/Epilepsiae \
--patients pat_FR_253 pat_FR_384 \
--output-path ./extracted_data \
--dtype float32
Options:
--patients,-p: patient IDs separated by spaces or commas.--output-path,-o: output directory. Defaults to<input_path>/extracted_data.--workers,-w: number of patient-processing threads.--dtype:float32orfloat64foreeg.npy. The default isfloat32, which is normally the practical choice for large datasets.
Python Usage
from seizure_eeg_extractor import CHBMIT, EU
dataset = CHBMIT(
input_path="/path/to/chbmit/1.0.0",
output_path="./extracted_chbmit",
patients_wanted=["chb17", "chb20"],
output_dtype="float32",
)
dataset.process_patients(max_workers=2)
eu = EU(
input_path="/path/to/Epilepsiae",
output_path="./extracted_eu",
patients_wanted=["pat_FR_253"],
)
eu.process_patients(max_workers=1)
Expected Dataset Layout
CHB-MIT input should contain patient folders named chb01 through chb24.
Each patient folder should contain .edf files and the corresponding
<patient>-summary.txt; the dataset root may also contain SUBJECT-INFO.
EU input should contain patient folders named like pat_FR_253. The extractor
looks recursively inside each patient folder for:
.headfiles containing recording metadata.datafiles containing binary EEG samplesseizurelist.txtcontaining seizure onset and offset timestamps
Output Layout
extracted_data/
<patient_id>/
record_0/
eeg.npy
info.pkl
record_1/
eeg.npy
info.pkl
eeg.npy is saved as (samples, channels). info.pkl contains:
| Key | Description |
|---|---|
pid |
Patient ID, for example chb01 or pat_FR_253. |
fid |
Source recording ID without extension. |
fs |
Sampling frequency in Hz. |
num_samples |
Number of rows in eeg.npy. |
num_channels |
Number of EEG channels. |
channel_names |
Channel names when present in the source file/header. |
eeg_dtype |
Saved NumPy dtype, usually float32. |
num_seizures |
Number of seizure intervals overlapping the record. |
seizure_times |
List of seizure dictionaries with sample indices and, when available, timestamps. |
file_start_time |
Source file start timestamp when available. |
file_end_time |
Source file end timestamp when available. |
duration_in_sec |
Recording duration when available. |
Seizure sample indices are record-local. For EU records, seizures that cross a file boundary are clipped to the part that overlaps the current recording.
Processing Method
CHB-MIT processing reads each patient's summary text file, extracts per-file
start/end times and seizure start/end seconds, then reads EEG signals from EDF
files with pyEDFlib.
EU processing reads each patient's seizurelist.txt, parses absolute seizure
timestamps, reads each .head file for sample frequency, channel count, sample
count, start timestamp, binary sample width, and conversion factor, then loads
the matching .data file with NumPy. Seizures are assigned to every recording
whose time interval overlaps the seizure interval.
More details are in docs/processing-methods.md and docs/output-format.md.
Examples
Plot one channel from an extracted record:
python examples/plot_record.py ./extracted_data/chb17/record_0 \
--channel 0 \
--samples 1000
See examples/README.md.
Citation
If you use this software in academic work, please cite the repository using the metadata in CITATION.cff.
License
This project is distributed under the MIT License.
Testing
Run the lightweight test suite:
pytest
Integration tests require local raw dataset copies and are skipped by default:
CHBMIT_PATH=/path/to/chbmit/1.0.0 CHBMIT_PATIENT=chb01 pytest -m integration
EU_PATH=/path/to/Epilepsiae EU_PATIENT=pat_FR_253 pytest -m integration
The integration tests write extracted records to pytest-managed temporary directories, not into the raw dataset folders.
During public-release cleanup, the extractor was also exercised against real CHB-MIT and EU dataset samples. Processed records were verified for array shape, dtype, seizure interval bounds, finite sample windows, and cleanup after verification.
Repository Hygiene
Raw EEG files and extracted outputs are intentionally ignored by Git:
*.edf*.data*.head*.npy*.pkldata/,datasets/, andextracted_data/
Before publishing, inspect git status --ignored if you have worked with local
data inside the repository tree.
Public Release Checklist
- Verify that no raw EEG files, extracted arrays, or patient-derived outputs are committed.
- Confirm the wording and citations required by the source datasets.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seizure_eeg_extractor-0.1.0.tar.gz.
File metadata
- Download URL: seizure_eeg_extractor-0.1.0.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caa4ce256890ebdaa19bb17f786283e9223a10d370a40cc0eea93b6cfbe48cba
|
|
| MD5 |
f77ca39403ebd4aad9e3a860e5cfbf9e
|
|
| BLAKE2b-256 |
5909d951fe56a5e9925cfc2e6dffa85aec71c5e21f34587a921550be7235d579
|
Provenance
The following attestation bundles were made for seizure_eeg_extractor-0.1.0.tar.gz:
Publisher:
publish.yml on jamiekoe/seizure-eeg-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seizure_eeg_extractor-0.1.0.tar.gz -
Subject digest:
caa4ce256890ebdaa19bb17f786283e9223a10d370a40cc0eea93b6cfbe48cba - Sigstore transparency entry: 1495368979
- Sigstore integration time:
-
Permalink:
jamiekoe/seizure-eeg-extractor@fe926886aa1741058f8bea411f53671195639922 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jamiekoe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fe926886aa1741058f8bea411f53671195639922 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file seizure_eeg_extractor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: seizure_eeg_extractor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33bd5bb2ddff917ff33d52a6cc5a546533182a305e4cda8fda4e5381433f4b12
|
|
| MD5 |
079e32ad02ecabd73bb408df326c6ae1
|
|
| BLAKE2b-256 |
70c2b17eee1fc7c420c55a15d3fd0c532f0592088ca89569eebc7ceaf59f7791
|
Provenance
The following attestation bundles were made for seizure_eeg_extractor-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on jamiekoe/seizure-eeg-extractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seizure_eeg_extractor-0.1.0-py3-none-any.whl -
Subject digest:
33bd5bb2ddff917ff33d52a6cc5a546533182a305e4cda8fda4e5381433f4b12 - Sigstore transparency entry: 1495369081
- Sigstore integration time:
-
Permalink:
jamiekoe/seizure-eeg-extractor@fe926886aa1741058f8bea411f53671195639922 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jamiekoe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fe926886aa1741058f8bea411f53671195639922 -
Trigger Event:
workflow_dispatch
-
Statement type: