Skip to main content

Package for fitting Hidden Semi-Markov Models to electro-encephalographic data

Project description

HsMM-MVpy

hsmm_mvpy is an open-source Python package to estimate Hidden Semi-Markov Models in a Multivariate Pattern Analysis (HsMM-MVPA) of electro-encephalographic (EEG) data based on the method developed by Anderson, Zhang, Borst, & Walsh (2016) and Borst & Anderson (2021)

Documentation

The package is available through pip with the command pip install hsmm_mvpy. A recommended way of using the package is to use a python environment e.g. through conda (see anaconda for how to install conda):

$ conda create -n hsmm xarray mne 
$ conda activate hsmm
$ pip install hsmm_mvpy

Then import hsmm-mvpy in your favorite python IDE through:

    import hsmm_mvpy as hsmm

For the cutting edge version (not recommended) you can clone the repository using git

Open a terminal and type:

$ git clone https://github.com/gweindel/hsmm_mvpy.git

Then move to the clone repository and run

$ pip install -e .

To get started

To get started with the code:

  • Check the demo below
  • Inspect the tutorials in the tutorials repository

To get started or further learn about the method be sure to check the paper by Anderson, Zhang, Borst, & Walsh (2016) as well as the book chapter by Borst & Anderson (2021). The following also contains a non-exhaustive list of papers published with HsMM-MVPA:

  • Berberyan, H. S., van Maanen, L., van Rijn, H., & Borst, J. (2021). EEG-based identification of evidence accumulation stages in decision-making. Journal of Cognitive Neuroscience, 33(3), 510-527. link
  • Van Maanen, L., Portoles, O., & Borst, J. P. (2021). The discovery and interpretation of evidence accumulation stages. Computational brain & behavior, 4(4), 395-415. link
  • Portoles, O., Blesa, M., van Vugt, M., Cao, M., & Borst, J. P. (2022). Thalamic bursts modulate cortical synchrony locally to switch between states of global functional connectivity in a cognitive task. PLoS computational biology, 18(3), e1009407. link

Demo on simulated data

The following section will quickly walk you through an example usage in simulated data

First we load the libraries necessary for the demo on simulated data

Importing libraries

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import seaborn as sns
from mne import channels

## Importing 
import hsmm_mvpy as hsmm
from hsmm_mvpy import simulations

Simulating data

In the following code block we simulate 30 trials from four known sources. All these four sources are defined by a localization, an activation amplitude and a distribution (here gamma with shape and scale parameters) for the onsets of the bumps on each trial. The simulation functions are based on this MNE tutorial .

cpus = 16 # For multiprocessing, usually a good idea to use multiple CPus as long as you have enough RAM
path = 'simulated/'#Wehre simulated data will go, vreate that folder if you don't have it where you're executing the code

n_events = 30 #Number of trials to simulate

sources = [['lateraloccipital-lh',1e-8, [np.random.gamma,2,30]],#One source = localization, acitvation amplitude and onset latencies
           ['postcentral-lh', 1e-8, [np.random.gamma, 2, 50]],
           ['posteriorcingulate-rh', 1e-8, [np.random.gamma, 2,80]],
           ['postcentral-rh', 1e-8, [np.random.gamma, 2,110]],
           ['postcentral-lh', 1e-10, [np.random.gamma, 2,65]]] #Equivalent to a response trigger as amplitude make it hardly visible

max_trial_length = 3000 #length of a trial (ISI)

bump_frequency = 10. #Frequency of the simulated bumps
file = 'dataset_tutorial' #Name of the file to save
mne_path = path+file+'_raw.fif'

raw, generating_events = simulations.simulate(sources, n_events, max_trial_length, cpus, bump_frequency, file, path, overwrite=True)
Aligning file name to MNE's convention
Simulating dataset_tutorial_raw.fif in simulated/
Overwriting existing file.
Writing /home/gweindel/owncloud/projects/RUGUU/hsmm-mvpy/simulated/dataset_tutorial_raw.fif
Closing /home/gweindel/owncloud/projects/RUGUU/hsmm-mvpy/simulated/dataset_tutorial_raw.fif
[done]
simulated/dataset_tutorial_raw.fif simulated

Creating the event structure and plotting the raw data

To recover the data we need to create the event structure based on the triggers sent during simulation. This is the same as analyzing real EEG data and recovering events in the stimulus channel. In our case 0 signals the onset of the stimulus and 5 the onset of the response. Hence a trial is defined as the times occuring between the triggers 0 and 5.

resp_trigger = int(np.max(np.unique(generating_events[:,2])))#Resp trigger is the last source in each trial
event_id = {'stimulus':0}
resp_id = {'response':resp_trigger}
events = generating_events[(generating_events[:,2] == 0) | (generating_events[:,2] == resp_trigger)]#only retain stimulus and response triggers

raw.copy().pick_types(eeg=True).plot(scalings=dict(eeg=1e-5), events=events);
Removing projector <Projection | PCA-v1, active : True, n_channels : 102>
Removing projector <Projection | PCA-v2, active : True, n_channels : 102>
Removing projector <Projection | PCA-v3, active : True, n_channels : 102>
Using matplotlib as 2D backend.
Opening raw-browser...

png

Recovering number of sources as well as actual by-trial variation

To compare the by-trial duration of bumps that we will estimate later on we first recover the actual number of sources used in the simulation as well as the actual by-trial variation in the onset of the bumps.

%matplotlib inline
number_of_sources = len(np.unique(generating_events[:,2])[1:])#one trigger = one source
random_source_times = np.zeros((int(len(generating_events)/(number_of_sources+1)), number_of_sources))

i,x = 1,0                  
while x < len(random_source_times):
    for j in np.arange(number_of_sources):#recovering the individual duration- of bump onset
        random_source_times[x,j] = generating_events[i,0] - generating_events[i-1,0]
        i += 1
    i += 1
    x += 1

Then we read the EEG data as we would for a single participant:

# Reading the data
eeg_dat = hsmm.utils.read_mne_EEG(mne_path, event_id, resp_id, raw.info['sfreq'],events, verbose=False)
Processing participant simulated/dataset_tutorial_raw.fif
Reading 0 ... 104999  =      0.000 ...   174.819 secs...
N trials without response event: 0
Applying reaction time trim to keep RTs between 0.001 and 5 seconds
30 trials were retained for participant simulated/dataset_tutorial_raw.fif
End sampling frequency is 600.614990234375 Hz

The package uses xarray named dimension matrices, allowing to directly manipulate the data using the name of the dimensions:

#example of usage of xarray
print(eeg_dat)
eeg_dat.sel(epochs=0,electrodes=['EEG 001','EEG 002','EEG 003']).plot.scatter(x='samples', y='data',hue='electrodes');
<xarray.Dataset>
Dimensions:     (participant: 1, epochs: 30, electrodes: 59, samples: 1238)
Coordinates:
  * epochs      (epochs) int64 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * electrodes  (electrodes) <U7 'EEG 001' 'EEG 002' ... 'EEG 059' 'EEG 060'
  * samples     (samples) int64 0 1 2 3 4 5 6 ... 1232 1233 1234 1235 1236 1237
Dimensions without coordinates: participant
Data variables:
    data        (participant, epochs, electrodes, samples) float64 -7.847e-07...
    event       (participant, epochs) <U8 'stimulus' 'stimulus' ... 'stimulus'
Attributes:
    sfreq:    600.614990234375

png

Next we transform the data as in Anderson, Zhang, Borst, & Walsh (2016) including standardization of individual variances (not in this case as we have only one simulated participant), z-scoring and spatial principal components analysis (PCA).

Note that if the number of components to retain is not specified, the scree plot of the PCA is displayed and a prompt ask how many PCs should be retained. Hence in your applications be sure to leave the n_comp to the default value.

hsmm_dat, PCs, explained_var, means = hsmm.utils.transform_data(eeg_dat.data,'',
        apply_standard=False, single=True, n_comp=10, return_weights=True)

hsmm_dat = hsmm.utils.stack_data(hsmm_dat,'',single=True)

Estimating an HsMM model

We know that we generate from four sources so let's just try to recover those four sources by directly estimating a 4 bump model (but see tutorial 3 on how to infer the optimal number of bumps).

init = hsmm.models.hsmm(hsmm_dat.data.T[:,:,0], hsmm_dat.starts.data, 
                 hsmm_dat.ends.data, sf=eeg_dat.sfreq, bump_width=50, cpus=16)#Initialization of the model

selected = init.fit_single(number_of_sources-1, starting_points=25)#function to fit an instance of a 4 bumps model with 25 random starting points for the expectation maximization algorithm
Estimating all solutions for 4 bumps model with 24 random starting points
Estimating parameters for 4 bumps model
Parameters estimated for 4 bumps model

Visualizing results of the fit

bump_times_selected = init.bump_times(selected.eventprobs)#computing predicted bump times
mean_bump_times_selected = np.mean(bump_times_selected, axis=0)

positions = np.delete(channels.layout._find_topomap_coords(raw.info, 'eeg'),52,axis=0)#inferring electrode location
electrodes_selected = hsmm.utils.reconstruct(selected.magnitudes, PCs, explained_var, means)#reconstructing electrode activity

We can directly take a look to the topologies and latencies of the bumps by calling hsmm.visu.plot_topo_timecourse

hsmm.visu.plot_topo_timecourse(electrodes_selected, mean_bump_times_selected, positions, 
                               bump_size=init.bump_width_samples, magnify=5, figsize=(12,2),
                                time_step = 1,  times_to_display = np.mean(init.ends - init.starts))

png

And take a closer look to stages and their by-trial variation

ax = hsmm.visu.plot_latencies_average(bump_times_selected, init.bump_width_samples, 1, errs='std', times_to_display = np.mean(init.ends - init.starts))
ax.set_ylabel('your label here');

png

And inspect the probability distribution of bump onsets:

hsmm.visu.plot_distribution(selected.eventprobs.mean(dim=['trial']), xlims=(0,1000))
hsmm.visu.plot_distribution(selected.eventprobs.mean(dim=['trial']), xlims=(0,1000), survival=True);

png

png

As HsMM-MVPA estimates bumps onset per trial we can also look at the predicted bump onsets for a single trial

hsmm.visu.plot_distribution(selected.eventprobs.sel(trial=0), xlims=(0,1000))
<AxesSubplot:xlabel='Time (in samples)', ylabel='p(event)'>

png

Comparing with ground truth

Now as we simulated the data we have access to the ground truth of the underlying generative events. We can then compare the average stage durations compared to the one estimated by HsMM-MVPA

colors = sns.color_palette(None, number_of_sources)
plt.scatter(np.mean(random_source_times, axis=0), selected.parameters.dropna('stage').isel(params=1)*2+np.concatenate([[0],np.repeat(init.bump_width_samples,number_of_sources-1)]), color=colors,s=50)
plt.plot([np.min(np.mean(random_source_times,axis=0)),np.max(np.mean(random_source_times,axis=0))],
         [np.min(np.mean(random_source_times,axis=0)),np.max(np.mean(random_source_times,axis=0))],'--');
plt.title('Actual vs estimated stage durations')
plt.xlabel('Estimated stage duration')
plt.ylabel('Actual stage duration')
plt.show()

png

Or also overlay actual bumps onset with predicted one

positions = np.delete(channels.layout._find_topomap_coords(raw.info, 'eeg'),52,axis=0)#inferring electrode location
electrodes = hsmm.utils.reconstruct(selected.magnitudes, PCs, explained_var, means)

hsmm.visu.plot_topo_timecourse(electrodes, np.mean(init.bump_times(selected.eventprobs), axis=0), positions,#inferring electrode location, 
        bump_size=init.bump_width_samples, time_step = 1, magnify=4, figsize=(13,2), title='Actual vs estimated bump onsets',
        times_to_display = np.mean(np.cumsum(random_source_times,axis=1),axis=0))

png

And even compare single trial onsets with those estimated from the HsMM model

fig, ax= plt.subplots(number_of_sources-1,1, figsize=(5,3.5*number_of_sources))
ax[0].set_title('Comparing true vs estimated single trial stage durations')
i = 0
gen_bump_location = np.cumsum(random_source_times[:,:-1], axis=1)
for bump in init.bump_times(selected.eventprobs)[:,:number_of_sources-1].T:
    sns.regplot(x=gen_bump_location[:,i].T, y=bump, ax=ax[i], color=colors[i])
    ax[i].plot([np.min(bump), np.max(bump)], [np.min(bump), np.max(bump)],'--')
    i+= 1

png

For examples on how to use the package when the number of bumps are unkown, or to compare stage durations across conditions see the tutorial notebooks.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hsmm_mvpy-0.0.4.post25.tar.gz (31.4 kB view hashes)

Uploaded Source

Built Distribution

hsmm_mvpy-0.0.4.post25-py3-none-any.whl (27.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page