eeghdf is a module for reading a writing EEG data into the hdf5 format
Project description
eeghdf
Project to develop a easily accessible format for storing EEG in a way that is easy to access for machine learning.
Features
Features derived from hdf5 format:
- hdf5 offers reliable, checksummed and compressed storage of digital EEG which was designed for long-term storage of data
- hdf5 is supported widely C, C++, javascript, python, julia, matlab,
- eeghdf offers a numpy-like interface to data without requiring the whole file to be loaded in memory
- efficient reading (the whole file is not read into memory to access data)
- cloud enabled direct streaming from S3 buckets via the rcos3 driver
- "self documenting" and extensible
- advanced features: parallel readers/single writer, MPI, streaming supported
Additional goals/features:
- build set of tools to visualize and analyze EEG based upon this format, visualization
- easy convertion to other formats: first target is mne-python "raw" format, BIDS-EEG next?
Alternatives, background research and future goals
- looked at edf and neo formats, see Neurodata Without Borders. Compare with XDF.
- simplier than neo, but may need more of neo's structures as use grows
- ONE format
- compare with MNE fif format of mne project to evolve
future goals
- look to support multiple records and different sampling rates
- look to add fields for clinical report text
- look to add field for montages and electrode geometry
- "extension" group
installation
pip install eeghdf
Simple install for developers
This assumes you want to make changes to the eeghdf code.
- change to the desired python virtual environment
- make sure you have git and git-lfs installed
git clone https://github.com/eegml/eeghdf.git
cd eeghdf
python setup-dev.py develop
Re-sampling
There are many ways to resample signals. In my examples I used an approach based upon libsamplerate because it seemed to give accurate results. Depending on your platform there are many options. Recently I have been suing pytorch based tools a lot, torchaudio has resamplinge tools and librosa is looks very impressive.
Installation will vary but on ubuntu 18.04 I did:
sudo apt install libsamplerate-dev
pip install git+https://github.com/cournape/samplerate/#egg=samplerate
Ultimately I will move the resampling code out of this repo. Maybe put it in eegml-signal
To Do
- code to write file, target initial release version is 1000
- initial scripts to convert edf to eeghdf and floating point hdf5
- code to subsample and convert edf -> eeghdf
- code to write back to edf
- more visualization code -> push to eegvis
- add convenience interface to phys_signal with automagic conversion from digital->phys units - should this use a subclass of numpy?
- add study admin code to record info (do not seem to include this now, e.g. EEG No like V17-105)
- code to clip and create subfiles
- allow patient info to propagate
- hash list/tree of history of file so that can track provenance of waveforms if desired
- clip and maintain correct (relative) times
- consider how to handle derived records: for example the downsampled float32 records "frecord200Hz"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.