Skip to main content

Load and process brain datasets for deep learning

Project description

🍍PNPL Brain Data Deep Learning Library

The current primary use of the PNPL library is for the LibriBrain competition. Click here to learn more and get started!

Welcome to PNPL — a Python toolkit for loading and processing brain datasets for deep learning. The package now ships four MEG dataset loaders (LibriBrain, MEG-MASC, Armeni 2022, MOUS) plus a composable preprocessing pipeline and shared task abstractions.

Features

  • Friendly dataset APIs backed by real MEG recordings
  • Composable preprocessing pipeline (bads+headpos+sss+notch+bp+ds, etc.)
  • On-demand download from Hugging Face (LibriBrain), OSF (MEG-MASC), Radboud WebDAV (Armeni, MOUS), and OpenNeuro (LittlePrince)
  • Task-based API: pick a task object, get (x, y) (or (x, y, info)) windows
  • Works with PyTorch DataLoader out of the box
  • Clean namespace and lazy imports to keep startup fast

Installation

pip install pnpl

This installs the package and its core dependencies.

Usage

A common entry point uses a task object:

from pnpl.datasets import LibriBrain
from pnpl.tasks import SpeechDetection

dataset = LibriBrain(
    data_path="./data/LibriBrain",
    task=SpeechDetection(tmin=0.0, tmax=0.5),
    partition="train",
)

sample_data, label = dataset[0]
print(sample_data.shape, label.shape)

Dataset-specific wrapper classes are also available:

from pnpl.datasets import LibriBrainSpeech, LibriBrainPhoneme

speech_ds = LibriBrainSpeech(data_path="./data/LibriBrain", partition="train")
phoneme_ds = LibriBrainPhoneme(data_path="./data/LibriBrain", partition="train")

The same task-based pattern works for the other corpora:

from pnpl.datasets import Gwilliams2022, Armeni2022, Schoffelen2019
from pnpl.tasks.gwilliams2022 import PhonemeClassification

meg_masc = Gwilliams2022(
    data_path="./data/meg_masc",
    task=PhonemeClassification(tmin=-0.2, tmax=0.6),
    include_subjects=["01"], include_sessions=["0"], include_tasks=["0"],
    preprocessing="notch+bp+ds",
)

For the full LibriBrain release (deep sub-0 across 9 Sherlock books + TIMIT + MOCHA-TIMIT + 30 Moth podcasts, plus 32 broad subjects on Sherlock1 ses-11/ses-12), use LibriBrain100:

from pnpl.datasets import LibriBrain100
from pnpl.tasks import SpeechDetection

ds = LibriBrain100(
    data_path="./data/LibriBrain100",
    task=SpeechDetection(tmin=0.0, tmax=0.5),
    partition="train",
    subjects="deep",       # or "broad", "all", 0, [1, 2, 3], range(1, 33)
    corpus="sherlock",     # or "timit", "mocha", "podcasts", "all"
)

Included Datasets

Class Source Auth
LibriBrain (+ LibriBrainSpeech/Phoneme/Word/Sentence) Hugging Face pnpl/LibriBrain none
LibriBrain100 (+ LibriBrain100Speech/Phoneme/Word) HF pnpl/LibriBrainpnpl/LibriBrain2 (deep + broad release) none
Gwilliams2022 (MEG-MASC) OSF ag3kj none
Armeni2022 Radboud DSC_3011085.05_995_v1 Radboud credentials
Schoffelen2019 (MOUS) Radboud DSC_3011020.09_236_v1 Radboud credentials
Pallier2025 (LittlePrince Listen) OpenNeuro ds007523 none

For the Radboud-hosted datasets, set RADBOUD_USERNAME and RADBOUD_PASSWORD (an approved data-sharing agreement is required before access is granted).

Support

In case of any questions or problems, please get in touch through our Discord server.

Quickstart

Load a single run of the LibriBrain Speech dataset and iterate samples:

from pnpl.datasets.libribrain2025 import constants
from pnpl.datasets import LibriBrainSpeech

ds = LibriBrainSpeech(
    data_path="./data/LibriBrain",
    preprocessing_str="bads+headpos+sss+notch+bp+ds",
    include_run_keys=[constants.RUN_KEYS[0]],  # pick a single run
    tmin=0.0,
    tmax=0.2,
    standardize=True,
    include_info=True,
)

print(len(ds), "samples")
x, y, info = ds[0]
print(x.shape, y.shape, info["dataset"])  # (channels,time), (time,), "libribrain2025"

Documentation

We publish documentation with Jupyter Book and GitHub Pages.

  • Local preview: pip install -r docs/requirements.txt && jupyter-book build docs/ then open docs/_build/html/index.html.
  • GitHub Pages: when made public, enable Pages via repo settings to publish automatically from the existing workflow.

The docs cover:

  • Per-dataset pages (docs/libribrain.md, docs/gwilliams2022.md, docs/armeni2022.md, docs/schoffelen2019.md)
  • The preprocessing pipeline (docs/preprocessing.md) and tasks (docs/tasks.md)
  • Tutorials for the LibriBrain competition tracks

Contributing

We welcome contributions from the community!

  • Read the Contributor Guide in docs/contributing.md for setup, coding style, and PR workflow.
  • Open issues for bugs and enhancements with clear, minimal repros when possible.
  • Tests: add/update pytest tests for any feature or fix.

Quick dev setup:

git clone https://github.com/neural-processing-lab/pnpl.git
cd pnpl
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install pytest
pytest -q

Support and Questions

  • Check the FAQ at docs/faq.md.
  • If something is unclear in the docs, please open a documentation issue.

License

BSD‑3‑Clause. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pnpl-0.1.1.tar.gz (131.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pnpl-0.1.1-py3-none-any.whl (164.2 kB view details)

Uploaded Python 3

File details

Details for the file pnpl-0.1.1.tar.gz.

File metadata

  • Download URL: pnpl-0.1.1.tar.gz
  • Upload date:
  • Size: 131.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pnpl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c1eef2de17913536aedb2f51c0f63ef3f6e103551c04c1231d7805ce321c577f
MD5 2f225f5bbd8e46e2a1d898e47711e9da
BLAKE2b-256 3e88dab3ef0bd9618a1ce91ea38dde4681992a962902ea43c04d76dc7ccd25f1

See more details on using hashes here.

File details

Details for the file pnpl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pnpl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 164.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pnpl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2bfbf28c2363b46c324c187c184cdb4ad5404a0784ebc5822d73f47f8ca4d5e1
MD5 76ac7638a4afcc78f986792167c0b99e
BLAKE2b-256 816f70bf38bc88d90d099f781240eddea109a3d325362497c70d3721cecc43be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page