Skip to main content

A slurm friendly MEEG derivative extraction package leveraging bids-like data organization and DAG processing.

Project description

CI Docs codecov

NeuroDAGs

An Extensible and Declarative DAG Framework for Reproducible Neuroscience Workflows

M/EEG studies generate many interdependent intermediate derivatives. Recomputing full pipelines is wasteful; reusing valid intermediates is non-trivial. Large-scale studies require reproducible, extensible, and efficient workflows. NeuroDAGs addresses this with a declarative, graph-based framework for scalable and reusable derivative computation.

Poster BRaIN Symposium 2026 Montreal

Core Idea

Pipelines are defined as a directed acyclic graph (DAG) of computation nodes that output reusable derivatives, executed for each input file.

Design Principles

  • Reproducible, transparent workflows defined declaratively in YAML — version-controllable and LLM-friendly.
  • Uniform node abstraction — preprocessing, features, and any custom nodes are treated identically.
  • Directory-agnostic — outputs mirror inputs' organization. Derivatives are labeled with a @DerivativeName suffix.
  • xarray-centered outputs — derivatives stored as language-agnostic, metadata-rich, dimension-aware xarray → NetCDF.
  • Graph-based reuse — if a derivative is already computed and overwrite=False, it is skipped automatically.

Features

  • Agnostic to data organization / directory hierarchy
  • SLURM / HPC friendly with file-level parallelism via joblib
  • Graph-based caching: skip already-computed derivatives
  • Extensible node system — add nodes without forking the package
  • YAML-based declarative configuration
  • Built-in nodes for preprocessing, spectral analysis, entropy, complexity, and data transformations
  • Dataframe assembly (wide or long format) from derivative artifacts
  • Dry-run mode — inspect planned computations without executing
  • Built-in Dash-Plotly explorer for .fif and .nc files

Installation

pip install neurodags

With uv (recommended):

uv add neurodags

Quickstart

See the quickstart example — full synthetic pipeline, no real data required.

Development

git clone https://github.com/yjmantilla/neurodags
cd neurodags
uv sync --all-extras --all-groups # creates .venv and installs all deps incl. dev/test/docs
uv run pre-commit install

Key commands (all via uv run):

uv run ruff check src/              # lint  (fix: uv run ruff check src/ --fix)
uv run black --check .              # format check  (fix: uv run black .)
uv run pytest -q                    # run tests
uv run pytest -s -q --no-cov --pdb  # debug a failing test

uv run sphinx-build -b html docs docs/_build/html -W --keep-going  # build docs
rm -rf docs/_build                                                   # clean docs

No uv? Install it with pip install uv or curl -Ls https://astral.sh/uv/install.sh | sh. All commands above work with plain python/pip too — swap uv run → activate .venv, uv syncpip install -e .[dev,test,docs].

Project Structure

my_project/
├── datasets.yml      # Dataset sources and paths
├── pipeline.yml      # Derivative definitions and execution list
└── custom_nodes.py   # Optional custom node definitions

Quick Example

datasets.yml

my_dataset:
  name: MyDataset
  file_pattern:
    local: data/**/*.vhdr
    hpc: /cluster/BIDS/**/*.vhdr
  derivatives_path:
    local: outputs/
    hpc: /cluster/scratch/out

pipeline.yml

datasets: datasets.yml
mount_point: local
new_definitions: custom_nodes.py  # optional

DerivativeDefinitions:
  CleanedEEG:
    nodes:
      - id: 0
        derivative: SourceFile
      - id: 1
        node: basic_preprocessing
        args:
          mne_object: id.0
          resample: 256
          filter_args:
            l_freq: 0.5
            h_freq: 110

  PowerSpectrum:
    for_dataframe: True
    nodes:
      - id: 0
        derivative: CleanedEEG.fif
      - id: 1
        node: mne_spectrum_array
        args:
          meeg: id.0
          method: multitaper

DerivativeList:
  - CleanedEEG
  - PowerSpectrum

Python

from neurodags.loaders import load_configuration
from neurodags.orchestrators import iterate_derivative_pipeline

config = load_configuration("pipeline.yml")
iterate_derivative_pipeline(config, "CleanedEEG")
iterate_derivative_pipeline(config, "PowerSpectrum")

Custom Nodes

Add nodes without modifying or forking the package:

# custom_nodes.py
from neurodags.nodes import register_node
from neurodags.definitions import Artifact, NodeResult

@register_node
def my_node(data) -> NodeResult:
    result = compute(data)
    return NodeResult(
        artifacts={
            ".nc": Artifact(
                item=result,
                writer=lambda path: result.to_netcdf(path),
            ),
        },
    )

Key rules:

  1. A node is a function decorated with @register_node.
  2. It returns a NodeResult.
  3. A NodeResult contains artifacts — a dict mapping file extension to Artifact(item, writer).

Dataframe Assembly

from neurodags.orchestrators import build_derivative_dataframe

df = build_derivative_dataframe("pipeline.yml", output_format="wide")

Derivatives marked for_dataframe: True are collected automatically. Supports "wide" (one row per file) and "long" (one row per value) formats.

Parallel Execution

# pipeline.yml
n_jobs: 4           # -1 = all cores, 1 or null = serial
joblib_backend: loky
joblib_prefer: processes

Or via Python:

iterate_derivative_pipeline(config, "MyDerivative", n_jobs=4)

Visualization

python -m neurodags.visualization path/to/file.fif
python -m neurodags.visualization path/to/file.nc

Built-in Dash-Plotly explorer with dimension-aware UI — dropdown per axis, plot types: Line, Scatter, Bar, Heatmap.

Inspection (Dry Run)

iterate_derivative_pipeline(config, "MyDerivative", dry_run=True)

Returns a dataframe describing the execution plan without running any nodes. .error marker files prevent silent retry of failed runs.

Derivative Flags

Flag Default Description
save True Persist artifacts to disk. False = compute but don't write.
overwrite False Force recompute even if output exists.
for_dataframe False Include this derivative in build_derivative_dataframe.

Custom Node Definitions

Point new_definitions to one or more Python files:

new_definitions:
  - custom_nodes/my_nodes.py
  - /abs/path/to/other_nodes.py

Relative paths are resolved from the pipeline YAML location.

Documentation

https://yjmantilla.github.io/neurodags/

HDF5 / NetCDF Note

If you encounter RuntimeError: NetCDF: HDF error:

uv run pip install --no-binary=h5py h5py
# or without uv:
pip install --no-binary=h5py h5py

Contributing

See CONTRIBUTING.md.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurodags-0.1.1.tar.gz (69.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurodags-0.1.1-py3-none-any.whl (81.3 kB view details)

Uploaded Python 3

File details

Details for the file neurodags-0.1.1.tar.gz.

File metadata

  • Download URL: neurodags-0.1.1.tar.gz
  • Upload date:
  • Size: 69.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurodags-0.1.1.tar.gz
Algorithm Hash digest
SHA256 26162057c1dc31215727fe8b8c6c513e1002ec177d526cbd6cdd2221f846290f
MD5 912c0f4395918095eb4d267ae9751e05
BLAKE2b-256 4641f380eb1355ec2fce0d84be66ec1721d7e042d41d9531f80de91fdef25e56

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurodags-0.1.1.tar.gz:

Publisher: publish.yml on yjmantilla/neurodags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neurodags-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: neurodags-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 81.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurodags-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 964a1f4365ec438888eb2228529e4ba3f73cf17a7f1f133a2d7f7c15919da390
MD5 1a4cf43cee47a8985869715a6261caad
BLAKE2b-256 fbf1f57fe950079244f73422b2269a47c4ed665567daaeefeef6e8cde6b5fc5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurodags-0.1.1-py3-none-any.whl:

Publisher: publish.yml on yjmantilla/neurodags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page