Load and process brain datasets for deep learning
Project description
🍍PNPL Brain Data Deep Learning Library
The current primary use of the PNPL library is for the LibriBrain competition. Click here to learn more and get started!
Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package now ships four MEG dataset loaders (LibriBrain, MEG-MASC, Armeni 2022, MOUS) plus a composable preprocessing pipeline and shared task abstractions.
Features
- Friendly dataset APIs backed by real MEG recordings
- Composable preprocessing pipeline (
bads+headpos+sss+notch+bp+ds, etc.) - On-demand download from Hugging Face (LibriBrain), OSF (MEG-MASC), Radboud WebDAV (Armeni, MOUS), and OpenNeuro (LittlePrince)
- Task-based API: pick a task object, get
(x, y)(or(x, y, info)) windows - Works with PyTorch
DataLoaderout of the box - Clean namespace and lazy imports to keep startup fast
Installation
pip install pnpl
This installs the package and its core dependencies.
Usage
A common entry point uses a task object:
from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection
dataset = LibriBrain(
data_path="./data/LibriBrain",
task=SpeechDetection(tmin=0.0, tmax=0.5),
partition="train",
)
sample_data, label = dataset[0]
print(sample_data.shape, label.shape)
Dataset-specific wrapper classes are also available:
from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme
speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")
The same task-based pattern works for the other corpora:
from pnpl.datasets import Gwilliams2022, Armeni2022, Schoffelen2019
from pnpl.tasks.gwilliams2022 import PhonemeClassification
meg_masc = Gwilliams2022(
data_path="./data/meg_masc",
task=PhonemeClassification(tmin=-0.2, tmax=0.6),
include_subjects=["01"], include_sessions=["0"], include_tasks=["0"],
preprocessing="notch+bp+ds",
)
For the full LibriBrain release (deep sub-0 across 9 Sherlock books +
TIMIT + MOCHA-TIMIT + 30 Moth podcasts, plus 32 broad subjects on
Sherlock1 ses-11/ses-12), use LibriBrain100:
from pnpl.datasets import LibriBrain100
from pnpl.tasks import SpeechDetection
ds = LibriBrain100(
data_path="./data/LibriBrain100",
task=SpeechDetection(tmin=0.0, tmax=0.5),
partition="train",
subjects="deep", # or "broad", "all", 0, [1, 2, 3], range(1, 33)
corpus="sherlock", # or "timit", "mocha", "podcasts", "all"
)
Included Datasets
| Class | Source | Auth |
|---|---|---|
LibriBrain (+ LibriBrainSpeech/Phoneme/Word/Sentence) |
Hugging Face pnpl/LibriBrain |
none |
LibriBrain100 (+ LibriBrain100Speech/Phoneme/Word) |
HF pnpl/LibriBrain ∪ pnpl/LibriBrain2 (deep + broad release) |
none |
Gwilliams2022 (MEG-MASC) |
OSF ag3kj |
none |
Armeni2022 |
Radboud DSC_3011085.05_995_v1 |
Radboud credentials |
Schoffelen2019 (MOUS) |
Radboud DSC_3011020.09_236_v1 |
Radboud credentials |
Pallier2025 (LittlePrince Listen) |
OpenNeuro ds007523 |
none |
For the Radboud-hosted datasets, set RADBOUD_USERNAME and
RADBOUD_PASSWORD (an approved data-sharing agreement is required
before access is granted).
Support
In case of any questions or problems, please get in touch through our Discord server.
Quickstart
Load a single run of the LibriBrain Speech dataset and iterate samples:
from pnpl.datasets.libribrain2025 import constants
from pnpl.datasets import LibriBrainSpeech
ds = LibriBrainSpeech(
data_path="./data/LibriBrain",
preprocessing_str="bads+headpos+sss+notch+bp+ds",
include_run_keys=[constants.RUN_KEYS[0]], # pick a single run
tmin=0.0,
tmax=0.2,
standardize=True,
include_info=True,
)
print(len(ds), "samples")
x, y, info = ds[0]
print(x.shape, y.shape, info["dataset"]) # (channels,time), (time,), "libribrain2025"
Documentation
We publish documentation with Jupyter Book and GitHub Pages.
- Local preview:
pip install -r docs/requirements.txt && jupyter-book build docs/then opendocs/_build/html/index.html. - GitHub Pages: when made public, enable Pages via repo settings to publish automatically from the existing workflow.
The docs cover:
- Per-dataset pages (
docs/libribrain.md,docs/gwilliams2022.md,docs/armeni2022.md,docs/schoffelen2019.md) - The preprocessing pipeline (
docs/preprocessing.md) and tasks (docs/tasks.md) - Tutorials for the LibriBrain competition tracks
Contributing
We welcome contributions from the community!
- Read the Contributor Guide in
docs/contributing.mdfor setup, coding style, and PR workflow. - Open issues for bugs and enhancements with clear, minimal repros when possible.
- Tests: add/update
pytesttests for any feature or fix.
Quick dev setup:
git clone https://github.com/neural-processing-lab/pnpl.git
cd pnpl
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install pytest
pytest -q
Support and Questions
- Check the FAQ at
docs/faq.md. - If something is unclear in the docs, please open a documentation issue.
License
BSD‑3‑Clause. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pnpl-0.1.1.tar.gz.
File metadata
- Download URL: pnpl-0.1.1.tar.gz
- Upload date:
- Size: 131.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1eef2de17913536aedb2f51c0f63ef3f6e103551c04c1231d7805ce321c577f
|
|
| MD5 |
2f225f5bbd8e46e2a1d898e47711e9da
|
|
| BLAKE2b-256 |
3e88dab3ef0bd9618a1ce91ea38dde4681992a962902ea43c04d76dc7ccd25f1
|
File details
Details for the file pnpl-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pnpl-0.1.1-py3-none-any.whl
- Upload date:
- Size: 164.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bfbf28c2363b46c324c187c184cdb4ad5404a0784ebc5822d73f47f8ca4d5e1
|
|
| MD5 |
76ac7638a4afcc78f986792167c0b99e
|
|
| BLAKE2b-256 |
816f70bf38bc88d90d099f781240eddea109a3d325362497c70d3721cecc43be
|