seiz-eeg

Data loading and preprocessing of EEG scans for seizure-related ML tasks

These details have not been verified by PyPI

Project links

Project description

To open the road for reproducibility in seizure-related ML tasks, we implement a unified preprocessing library providing the functionality required to extract EEG clips in the format required by many ML algorithms.

The package documentation is available on readthedocs.

This package provides the following functionalities:

Data fetching
Pre-processing of EEG measurements
Creation of clips dataframe, with relevant start and end times
Dataset class, which handles:
- Data loading
- Data transforms

The first two steps are handled by seiz_eeg.preprocess, whose parameters can be set in a yaml configuration file, or passed as cli arguments. By default, the module looks for a config.yaml file in the working directory, but another file can be specified with the -c option. A dataset must be specified with the corresponding option, either in the .yaml file, or as follows:

python -m seiz_eeg.preprocess dataset=DATASET
# python -m seiz_eeg.preprocess -c path/to/config.yaml

The creation of clips is provided by seiz_eeg.clips and the Dataset is implemented in seiz_eeg.dataset.

More details on parameters in the Parameters section.

Installation

The code can be pip-installed directly from git, if you have proper authentication. Just run:

pip install git+https://github.com/LTS4/seizure_eeg.git

Otherwise, you can clone the repository and pip install it:

git clone https://github.com/LTS4/seizure_eeg.git
cd seizure_eeg
pip install seizure_eeg

How to use

Download and pre-processing

Data are downloaded to a subfolder of raw_path, declared in the source-specific configuration. Then, with functions which are tailored to different datasets, we pretreat the the data to give them a source agnostic structure.

Segments dataframe

EEG measurements come with little structure. To perform any data-driven task, we shall identify relevant information and organize them. This is generally provided in annotations files, which are separate for each EEG scan. By preliminary reading all of such files, we can create a tabular annotation dataframe, where entries are indexed by patient, session, segment, and channel. The following image shows a sample of such a table for the training split of the TUH Seizure corpus. Thanks to this added structure, it is easy to define clips of interest and quickly retrieve the relevant signals file, which can be read and processed.

EEG signals

In the usual pre-processing of EEG signals we read raw signals from a .edf file and resample them to the desired rate. Then we extract one clip of interest, e.g. the first seconds of a seizure, and we optionally split it in windows. Those can then be further transformed or fed to a model. Since many clips can be extracted out of the same file, it is convenient to save the resampled signal and avoid repeating expensive operations.

Datasets

TUH Seizure corpus

This corpus consists in many hours of labelled EEG sessions. The seiz_eeg.preprocess.tusz module provides code specific to this dataset annotations and EEG measurements.

To download the data, you need to register (free account). You will get a password and a username, which we recommend exporting to environment variables TUSZ_USER and TUSZ_PW. The password shall be included in the config.yaml file, or passed to the command line as follows:

python -m seiz_eeg.preprocess dataset=tusz tusz.user=$TUSZ_USER tusz.password=$TUSZ_PW

If you get a "Permission denied, please try again." message it is probably because your password is wrong.

More information about the TUH seizure corpus can be found on the TUH EEG Corpus website.

Parameters

Many parameters are available for data processing and they shall provided as configuration dataclasses (specified in seiz_eeg.config.py) to our functions.

The minimal parameter needed are - dataset, which specifies the dataset to preprocess. - raw_root, which specifies the root folder where the raw data is stored.

We use OmegaConf to merge .yaml files configuration and cli options in our runnable script (seiz_eeg.preprocess). An example of configuration file for TUH Seizure corpus is provided in config.yaml. The config file, or cli options can provide the following parameters:

config (DataConf)
│
├── dataset (str):                              Abbrv. of dataset to preprocess. Currently supported:
│                                                   - tusz: TUH Seizure Corpus
│                                                   - chbmit: CHB-MIT Scalp EEG Database
│
├── raw_root (str):                             Root folder for raw data (downloads)
│
├── processed_root (str):                       Root folder for preprocessed data
│
├── labels (DataLabelsConf):                    Seizure labels specifications
│   ├── map (Dict[str, int]):                       Map from string seizure codes to integers, e.g. ``bkgd -> 0`` and ``fnsz -> 1``
│   │
│   └── binary (bool):                              Wheter to read binary labels
│
├── signals (DataSignalsConf):                  Options for signals and clips processing
│   ├── diff_channels (bool):                       Wheter to compute channels diffrerences, e.g. "T3-T5", "P4-O2", etc.
│   ├── sampling_rate (int):                        Desired sampling rate, in Hz
│   ├── clip_length (float):                        Lenght of clips to extract, in seconds
│   ├── clip_stride (Union[float, str]):       Stride to extract the start times of the clips.
│   │                                               Integer or real values give explicit stride, in seconds.
│   │                                               If string, must be one of the following:
│   │                                                   - "start": extract one clip per segment, starting at onset/termination label.
│   │
│   ├── window_len (float):                         Lenght of windows to split the clip in in seconds.
│   │                                               If negative no windowing is performed.
│   │
│   ├── fft_coeffs (Optional[List[Optional[int]]]): FFT coefficient interval: *[min_index, max_index]*.
│   │                                               Include all with ``[None]`` or switch off FFT with ``None``.
│   │
│   └── node_level (bool):                          Wheter to work with node-level or global labels
│
└── tusz (DataSourceConf):                      Dataset parameters for TUH Seizure Corpus
    ├── version (str):                              Dataset version
    ├── force_download (bool):                      Download data even if they are already present
    ├── raw (str):                                  Path where to save raw data
    ├── processed (str):                            Path where to save preprocessed data
    ├── subsets (List[str]):                        List of subsets to include in preprocessing (e.g. ``["train", "test"]``)
    └── excluded_patients (Dict[str, List[str]]):   Map from subset to list of patients to exclude from it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.3

Mar 17, 2025

0.4.2

Mar 11, 2025

This version

0.4.1

Mar 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seiz_eeg-0.4.1.tar.gz (34.9 kB view details)

Uploaded Mar 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seiz_eeg-0.4.1-py3-none-any.whl (39.8 kB view details)

Uploaded Mar 11, 2025 Python 3

File details

Details for the file seiz_eeg-0.4.1.tar.gz.

File metadata

Download URL: seiz_eeg-0.4.1.tar.gz
Upload date: Mar 11, 2025
Size: 34.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for seiz_eeg-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`d4b67ad35c96d2ca118d2e7db816502d5a9fb760aeeed1febbd7810c99927589`
MD5	`b1bf777c790cb4c05f48a575ec2aba72`
BLAKE2b-256	`2223ea737e12fdaddb95c7c670ea9f1867afa9f90c6c451c21d7c7061b745ccc`

See more details on using hashes here.

File details

Details for the file seiz_eeg-0.4.1-py3-none-any.whl.

File metadata

Download URL: seiz_eeg-0.4.1-py3-none-any.whl
Upload date: Mar 11, 2025
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for seiz_eeg-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f310e2e760ca780de2f99610a550129a86ed89198fb8237b9d32c75f30f1d44`
MD5	`235ec1223b7aaad70df35b5ce48d8908`
BLAKE2b-256	`47ae419e7e7b81b1cb013f444c3d1924da09f967b309c340fa2432417311b3f6`

See more details on using hashes here.

seiz-eeg 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

How to use

Segments dataframe

EEG signals

Datasets

Parameters

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes