Skip to main content

EEG data for machine learning

Project description

EEG-Dash

To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.

Data source

The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will also incorporate a subset of the data converted from NEMAR, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.

Datasets available

There are currently only two datasets made available for testing purposes.

Dataset ID Description Participants Channels Task NEMAR Link
ds002718 EEG dataset focused on face processing with MRI for source localization 18 70 EEG, 2 EOG FaceRecognition NEMAR ds002718
ds004745 8-Channel SSVEP EEG dataset with trials including voluntary movements to introduce artifacts 6 8 EEG SSVEP tasks NEMAR ds004745

Data formatting

The data in EEG-DaSh is formatted to facilitate machine learning (ML) and deep learning (DL) applications by using a simplified structure commonly adopted by these communities. This will involve converting raw MEEG data into a matrix format, where samples (e.g., individual EEG or MEG recordings) are represented by rows, and values (such as time or channel data) are represented by columns. The data is also divided into training and testing sets, with 80% of the data allocated for training and 20% for testing, ensuring a balanced representation of relevant labels across sets. Hierarchical Event Descriptor (HED) tags will be used to annotate labels, which will be stored in a text table, and detailed metadata, including dataset origins and methods. This formatting process will ensure that data is ready for ML/DL models, allowing for efficient training and testing of algorithms while preserving data integrity and reusability.

Screenshot 2024-10-03 at 09 07 28

Data access

The data in EEG-DaSh is formatted to facilitate machine learning (ML) and deep learning (DL) applications by using a simplified structure commonly adopted by these communities. This will involve converting raw MEEG data into a matrix format, where samples (e.g., individual EEG or MEG recordings) are represented by rows, and values (such as time or channel data) are represented by columns. The data is also divided into training and testing sets, with 80% of the data allocated for training and 20% for testing, ensuring a balanced representation of relevant labels across sets. Hierarchical Event Descriptor (HED) tags will be used to annotate labels, which will be stored in a text table, and detailed metadata, including dataset origins and methods. This formatting process will ensure that data is ready for ML/DL models, allowing for efficient training and testing of algorithms while preserving data integrity and reusability.

The data in EEG-DaSh is accessed through Python and MATLAB libraries specifically designed for this platform. These libraries will use objects compatible with deep learning data storage formats in each language, such as Torchvision.dataset in Python and DataStore in MATLAB. Users can dynamically fetch data from the EEG-DaSh server which is then cached locally.

Install

Use your preferred Python environment manager with Python > 3.9 to install the package. Here we show example using Conda environment with Python 3.11.5:

  • Create a new environment Python 3.11.5 -> conda create --name eegdash python=3.11.5
  • Switch to the right environment -> conda activate eegdash
  • Install dependencies (this is a temporary link that will be updated soon) -> pip install -r https://raw.githubusercontent.com/sccn/EEG-Dash-Data/refs/heads/develop/requirements.txt
  • Install eegdash package (this is a temporary link that will be updated soon) -> pip install -i https://test.pypi.org/simple/ eegdash
  • Check installation. Start a Python session and type from eegdash import EEGDash

Python data access

To create a local object for accessing the database, use the following code:

from eegdash import EEGDash
EEGDashInstance = EEGDash()

Once the object is instantiated, it can be utilized to search datasets. Providing an empty parameter will search the entire database and return all available datasets.

EEGDashInstance.find({})

A list of dataset is returned.

[{'schema_ref': 'eeg_signal',
  'data_name': 'ds004745_sub-001_task-unnamed_eeg.set',
  'dataset': 'ds004745',
  'subject': '001',
  'task': 'unnamed',
  'session': '',
  'run': '',
  'modality': 'EEG',
  'sampling_frequency': 1000,
  'version_timestamp': 0,
  'has_file': True,
  'time_of_save': datetime.datetime(2024, 10, 25, 14, 11, 48, 843593, tzinfo=datetime.timezone.utc),
  'time_of_removal': None}, ...

Additionally, users can search for a specific dataset by specifying criteria.

EEGDashInstance.find({'task': 'FaceRecognition'})

After locating the desired dataset or data record, users can download it locally by executing the following command. This will return an xArray Python object.

XArrayData = EEGDashInstance.get({'task': 'FaceRecognition', 'subject': '019'})

Optionally, this is how you may access the raw data for the first record. This will return an numpy array.

npData = EEGDashInstance.get({'task': 'FaceRecognition', 'subject': '019'})[0].values

Example use

This example demonstrates the full workflow from data retrieval with EEGDash to model definition, data handling, and training in PyTorch.

Education - Coming soon...

We organize workshops and educational events to foster cross-cultural education and student training, offering both online and in-person opportunities in collaboration with US and Israeli partners. There is no event planned for 2024. Events for 2025 will be advertised on the EEGLABNEWS mailing list so make sure to subscribe.

About EEG-DaSh

EEG-DaSh is a collaborative initiative between the United States and Israel, supported by the National Science Foundation (NSF). The partnership brings together experts from the Swartz Center for Computational Neuroscience (SCCN) at the University of California San Diego (UCSD) and Ben-Gurion University (BGU) in Israel.

Screenshot 2024-10-03 at 09 14 06

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eegdash-0.0.6.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eegdash-0.0.6-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file eegdash-0.0.6.tar.gz.

File metadata

  • Download URL: eegdash-0.0.6.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for eegdash-0.0.6.tar.gz
Algorithm Hash digest
SHA256 09236b2d30234fb72d25df213ac43fd49f1ec4b9cc8afec7bb6365a1d1f3e9f1
MD5 3d27f0c1b99d2b1baabc2382dd6c1eb9
BLAKE2b-256 1d2cb06a92177004e1c11addc4b5f1b2ea5280957b2de3a9c75028b77dc55ddf

See more details on using hashes here.

File details

Details for the file eegdash-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: eegdash-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for eegdash-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5d5e794b98ee3162f0fadd17dcea16ca45a0f56e49c58a5de977f94714165c51
MD5 35152ef4624e0abdad9bc2a82ee28011
BLAKE2b-256 69c6e71739babc1dc63509e42a741c4631de4db7ac4ee0c5d58fabd700277340

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page