Skip to main content

A unifying framework to benchmark NeuroAI models.

Project description

NeuralBench: Unified benchmark for NeuroAI models

NeuralBench: open, reproducible benchmarking of NeuroAI models on EEG, MEG, and fMRI -- 36 EEG tasks, 94 EEG datasets (9.5k+ subjects, 13.6k+ hours of recording), and 14 models

neuralbench is a unified framework to benchmark NeuroAI models. It is designed for evaluating pretrained or randomly initialized models on a diverse suite of downstream tasks for brain modeling -- not for pretraining itself. It supports multiple neuroimaging devices -- EEG, MEG, and fMRI -- with more tasks and devices to come.

Examples:

neuralbench eeg audiovisual_stimulus -m eegnet   # EEG audiovisual stimulus classification with EEGNet

See neuralbench in the documentation.

Installation

Install from PyPI:

pip install neuralbench

Or install from source (e.g. for development):

cd neuralbench-repo
pip install -e .

Quick start

As an example, let's run the audiovisual stimulus classification task with the default model from EEG. This task uses the MNE sample dataset, which is small (~1.5 GB) and can be downloaded quickly. We use it both as a sanity-check task and as a probe of model behaviour in very-low-data regimes (a single subject, 288 trials, 4-class classification).

When you first run neuralbench, you will be prompted to configure paths for data (DATA_DIR), cache (CACHE_DIR) and results (SAVE_DIR). The configuration will be saved to ~/.neuralbench/config.json by default. Set NEURALBENCH_CONFIG=/path/to/config.json to override this location (useful on shared machines or in CI; see Custom config location). If using Weights & Biases, see this section for setup instructions.

Evaluate a model on a downstream task in three commands:

neuralbench eeg audiovisual_stimulus --download   # 1. Download the data (here, the Mne2013SampleEeg study)
neuralbench eeg audiovisual_stimulus --prepare    # 2. Prepare cache (preprocessed data and targets)
neuralbench eeg audiovisual_stimulus              # 3. Run the full grid

Steps 2 and 3 dispatch to SLURM when it is auto-detected on your machine; step 3 additionally requires SLURM_PARTITION to be set in your neuralbench config. Pass --debug to either step to force local execution. Step 2 is mostly useful for larger datasets that benefit from parallel preprocessing with SLURM, and is not strictly necessary for audiovisual_stimulus. Add --debug to any command for a fast local sanity-check run with a subsampled dataset and a limited number of epochs:

neuralbench eeg audiovisual_stimulus --debug      # Local validation run

By default, experiments use the EEGNet architecture[^1]; use -m <model> to swap models.

[^1]: Lawhern, Vernon J., et al. "EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces." Journal of neural engineering 15.5 (2018): 056013.

Results can be visualized on Weights & Biases, or aggregated locally using --plot-cached (see the Visualizing Results tutorial).

Mean normalized rank (lower is better) of 17 models on the NeuralBench-EEG-Core v1.0 suite, with foundation models, task-specific architectures, and baselines color-coded

Example output: model rankings on the NeuralBench-EEG-Core v1.0 suite (one dataset per task, lower rank is better).

[!TIP] See the full quickstart tutorial for a walkthrough of the CLI, config system, and model selection.

The same workflow applies to MEG and fMRI tasks -- just swap the device and task name:

neuralbench meg typing --debug          # MEG keystroke classification in debug mode
neuralbench fmri image --debug          # fMRI image retrieval in debug mode

Running the full EEG benchmark

To run all 36 EEG tasks end-to-end:

neuralbench eeg all --download          # 1. Download all datasets (~3.3 TB)
neuralbench eeg all --prepare           # 2. Build preprocessing cache (~35 GB)
neuralbench eeg all                     # 3. Run all 36 tasks

Use -m all_classic, -m all_fm, or -m all_classic all_fm to evaluate across all 8 task-specific EEG models, all 6 EEG foundation models, or all 14 EEG models respectively.

See the full EEG benchmark guide for prerequisites, resource requirements (~3.3 TB disk, 1 GPU with 32 GB VRAM per job), dataset variant options, and computational considerations.

[!IMPORTANT] A handful of datasets cannot be fetched automatically and require a one-time manual step (creating an account, accepting a license agreement, or submitting an application form). The affected tasks are pathology, artifact, and clinical_event (TUH EEG Corpus); image and meg/image (THINGS-images); emotion (FACED on Synapse); video (SEED-DV); fmri/image (NSD); motor_imagery and mental_arithmetic (Shin2017OpenA and Shin2017OpenB); eeg/typing / meg/typing (Levy2025Brain, not yet publicly released); and speech (Brennan2019 on Deep Blue Data: the legacy urlretrieve-based downloader is currently blocked by Cloudflare's bot challenge, so the v1 files must be fetched manually via Globus until upstream restores anonymous HTTP access). See the Datasets requiring manual download section of the benchmark guide for the exact steps for each one.

Weights & Biases setup

If using Weights & Biases for experiment tracking, first set the relevant environment variable:

export WANDB_API_KEY=your_wandb_api_key

Then configure the WandbLoggerConfig (from neuraltrain.utils) accordingly, e.g., by setting the host, name, group and entity fields to those of your project.

Leave WANDB_HOST="" blank in your neuralbench config to disable W&B logging entirely; results are still written to SAVE_DIR and remain accessible via --plot-cached.

Benchmark suites

NeuralBench packages its tasks and datasets into named, versioned evaluation suites. Any reported result should cite a concrete suite, e.g. NeuralBench-EEG-Core v1.0. The unqualified name NeuralBench refers to the framework only.

Naming scheme: NeuralBench-<Modality>-<Variant> v<Major>.<Minor>

  • ModalityEEG, MEG, fMRI (one version history per modality).
  • VariantCore (one dataset per task; broad coverage across paradigms) or Full (every dataset registered for each task; reveals within-task variability). Full is always a strict superset of Core at the same version.
  • Version — semver-style tag for the frozen task/dataset/split specification. Minor bumps are additive and back-comparable; major bumps are breaking.

Current release:

  • NeuralBench-EEG-Core v1.0 — one dataset × all EEG tasks.
  • NeuralBench-EEG-Full v1.0 — all datasets × all EEG tasks.

Available Tasks

The following tasks are available in neuralbench, organized by device:

EEG (36 tasks): age, artifact, audiovisual_stimulus, clinical_event, cvep, dementia_diagnosis, depression_diagnosis, emotion, ern, image, lrp, mental_arithmetic, mental_imagery, mental_workload, mismatch_negativity, motor_execution, motor_imagery, n170, n2pc, n400, p3, parkinsons_diagnosis, pathology, psychopathology, reaction_time, schizophrenia_diagnosis, seizure, sentence, sex, sleep_arousal, sleep_stage, speech, ssvep, typing, video, word

MEG (2 tasks): image, typing

fMRI (1 task): image

[!NOTE] More MEG and fMRI tasks are under development. Contributions are welcome!

Adding a new task

See the Adding a New Task tutorial.

Adding a new model

See the Adding a New Model tutorial.

(Advanced) Modifying the training loop

The training loop is implemented in neuralbench/pl_module.py as a PyTorch Lightning LightningModule (BrainModule). Override or extend this class to customize training, validation, or test steps. See the neuralbench API reference for details.

Contributing

See the CONTRIBUTING file for how to help out.

Citing

@misc{banville2026neuralbench,
  title        = {NeuralBench: A Unifying Framework to Benchmark NeuroAI Models},
  author       = {Banville, Hubert and d'Ascoli, St{\'e}phane and Dahan, Simon and Rapin, J{\'e}r{\'e}my and Careil, Marl{\`e}ne and Benchetrit, Yohann and L{\'e}vy, Jarod and Panchavati, Saarang and Ratouchniak, Antoine and Zhang, Mingfang and Cascardi, Elisa and Begany, Katelyn and Brooks, Teon and King, Jean-R{\'e}mi},
  year         = {2026},
  howpublished = {Brain \& AI team, Meta FAIR},
  url          = {https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/},
}

Third-Party Content

Third party content pulled from other locations are subject to their own licenses and you may have other legal obligations or restrictions that govern your use of that content.

License

neuralbench is MIT licensed, as found in the LICENSE file. Also check-out Meta Open Source Terms of Use and Privacy Policy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralbench-0.2.0.tar.gz (195.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuralbench-0.2.0-py3-none-any.whl (289.5 kB view details)

Uploaded Python 3

File details

Details for the file neuralbench-0.2.0.tar.gz.

File metadata

  • Download URL: neuralbench-0.2.0.tar.gz
  • Upload date:
  • Size: 195.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for neuralbench-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8a2bb64537810d552ed4356868aed7932a9ceb55065d1ba474cc61a895b3fe90
MD5 073b7a992d8a5ed316471eebd3b6c053
BLAKE2b-256 e16b16f75867972dbf13081e864f49d6589bf86ef38032a179abba437f5bd3a7

See more details on using hashes here.

File details

Details for the file neuralbench-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: neuralbench-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 289.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for neuralbench-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a40e5d3e7d9afc6d483571c1407ea3dc1ad38a1852552181ebc67dd5dd9c0883
MD5 2ac2c282e632baf9ae386898367b1f5d
BLAKE2b-256 0754d6795b3eb436ef5478685de30895d4a0405befa2541e053e142b724d06a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page