Skip to main content

A slurm friendly MEEG derivative extraction package leveraging bids-like data organization and DAG processing.

Project description

CI Docs codecov PyPI

NeuroDAGs

An Extensible and Declarative DAG Framework for Reproducible Neuroscience Workflows

M/EEG studies generate many interdependent intermediate derivatives. Recomputing full pipelines is wasteful; reusing valid intermediates is non-trivial. Large-scale studies require reproducible, extensible, and efficient workflows. NeuroDAGs addresses this with a declarative, graph-based framework for scalable and reusable derivative computation.

Docs | Comparison with Snakemake/Pydra | Poster BRaIN Symposium 2026 Montreal

Core Idea

Pipelines are defined as a directed acyclic graph (DAG) of computation nodes that output reusable derivatives, executed for each input file.

Design Principles

  • Reproducible, transparent workflows defined declaratively in YAML — version-controllable and LLM-friendly.
  • Uniform node abstraction — preprocessing, features, and any custom nodes are treated identically.
  • Directory-agnostic — outputs mirror inputs' organization. Derivatives are labeled with a @DerivativeName suffix.
  • xarray-centered outputs — derivatives stored as language-agnostic, metadata-rich, dimension-aware xarray → NetCDF.
  • Graph-based reuse — if a derivative is already computed and overwrite=False, it is skipped automatically.

Features

  • Agnostic to data organization / directory hierarchy
  • SLURM / HPC friendly with file-level parallelism via joblib
  • Graph-based caching: skip already-computed derivatives
  • Extensible node system — add nodes without forking the package
  • YAML-based declarative configuration
  • Unified CLI: neurodags run, dry-run, dataframe, dag, view, validate, tui
  • Built-in Terminal User Interface (TUI) for pipeline management and execution
  • Built-in nodes for preprocessing, spectral analysis, entropy, complexity, and data transformations
  • Dataframe assembly (wide or long format) from derivative artifacts
  • Dry-run mode — inspect planned computations without executing
  • Built-in Dash-Plotly explorer for .fif and .nc files

Installation

pip install neurodags
# Or with TUI support
pip install neurodags[tui]

With uv (recommended):

uv add neurodags
# Or with TUI support
uv add neurodags[tui]

Quickstart

See the quickstart example — full synthetic pipeline, no real data required.

CLI

NeuroDAGs installs a unified neurodags command:

neurodags validate pipeline.yml
neurodags run pipeline.yml                          # all derivatives in DerivativeList
neurodags run pipeline.yml --derivative CleanedEEG  # or a specific one
neurodags dry-run pipeline.yml --output dry_run.csv
neurodags dataframe pipeline.yml --format wide --output features.csv
neurodags dag pipeline.yml --html pipeline_dag.html
neurodags view path/to/file.nc

If you install the optional TUI extra, you also get:

neurodags tui pipeline.yml --datasets datasets.yml

Development

git clone https://github.com/yjmantilla/neurodags
cd neurodags
uv sync --all-extras --all-groups # creates .venv and installs all deps incl. dev/test/docs
uv run pre-commit install

Key commands (all via uv run):

uv run ruff check src/              # lint  (fix: uv run ruff check src/ --fix)
uv run black --check .              # format check  (fix: uv run black .)
uv run pytest -q                    # run tests
uv run pytest -s -q --no-cov --pdb  # debug a failing test

uv run sphinx-build -b html docs docs/_build/html -W --keep-going  # build docs
rm -rf docs/_build                                                   # clean docs

No uv? Install it with pip install uv or curl -Ls https://astral.sh/uv/install.sh | sh. All commands above work with plain python/pip too — swap uv run → activate .venv, uv syncpip install -e .[dev,test,docs].

Project Structure

my_project/
├── datasets.yml      # Dataset sources and paths
├── pipeline.yml      # Derivative definitions and execution list
└── custom_nodes.py   # Optional custom node definitions

Quick Example

datasets.yml

my_dataset:
  name: MyDataset
  file_pattern:
    local: data/**/*.vhdr
    hpc: /cluster/BIDS/**/*.vhdr
  derivatives_path:
    local: outputs/
    hpc: /cluster/scratch/out

pipeline.yml

datasets: datasets.yml
mount_point: local
new_definitions: custom_nodes.py  # optional

DerivativeDefinitions:
  CleanedEEG:
    nodes:
      - id: 0
        derivative: SourceFile
      - id: 1
        node: basic_preprocessing
        args:
          mne_object: id.0
          resample: 256
          filter_args:
            l_freq: 0.5
            h_freq: 110

  PowerSpectrum:
    for_dataframe: True
    nodes:
      - id: 0
        derivative: CleanedEEG.fif
      - id: 1
        node: mne_spectrum_array
        args:
          meeg: id.0
          method: multitaper

DerivativeList:
  - CleanedEEG
  - PowerSpectrum

Python

from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline

config = load_configuration("pipeline.yml")

# Run all derivatives in "DerivativeList", auto-sorted by dependency order
run_pipeline(config)

# Or run specific ones (also sorted by dependency order)
run_pipeline(config, derivatives=["CleanedEEG"])

CLI

neurodags validate pipeline.yml

# Run all derivatives in DerivativeList (dependency-sorted)
neurodags run pipeline.yml

# Or run specific ones
neurodags run pipeline.yml --derivative CleanedEEG

Custom Nodes

Add nodes without modifying or forking the package:

# custom_nodes.py
from neurodags.nodes import register_node
from neurodags.definitions import Artifact, NodeResult

@register_node
def my_node(data) -> NodeResult:
    result = compute(data)
    return NodeResult(
        artifacts={
            ".nc": Artifact(
                item=result,
                writer=lambda path: result.to_netcdf(path),
            ),
        },
    )

Key rules:

  1. A node is a function decorated with @register_node.
  2. It returns a NodeResult.
  3. A NodeResult contains artifacts — a dict mapping file extension to Artifact(item, writer).

Dataframe Assembly

from neurodags.orchestrators import build_derivative_dataframe

df = build_derivative_dataframe("pipeline.yml", output_format="wide")

Derivatives marked for_dataframe: True are collected automatically. Supports "wide" (one row per file) and "long" (one row per value) formats.

CLI equivalent:

neurodags dataframe pipeline.yml --format wide --output derivative_dataframe.csv

Parallel Execution

# pipeline.yml
n_jobs: 4           # -1 = all cores, 1 or null = serial
joblib_backend: loky
joblib_prefer: processes

Or via Python:

run_pipeline(config, derivatives=["MyDerivative"], n_jobs=4)

Or via CLI:

neurodags run pipeline.yml --derivative MyDerivative --n-jobs 4

Visualization

neurodags view path/to/file.fif
neurodags view path/to/file.nc

# Alternative module entry point
python -m neurodags.visualization path/to/file.fif
python -m neurodags.visualization path/to/file.nc

Built-in Dash-Plotly explorer with dimension-aware UI — dropdown per axis, plot types: Line, Scatter, Bar, Heatmap.

Inspection (Dry Run)

# All derivatives in DerivativeList
run_pipeline(config, dry_run=True)

# Or a specific one
run_pipeline(config, derivatives=["MyDerivative"], dry_run=True)

Returns a dataframe describing the execution plan without running any nodes. When a node fails, a .error marker file is written with the error message — failed files are retried on the next run. If a retry succeeds, the .error marker is automatically removed.

CLI equivalent:

# All derivatives in DerivativeList
neurodags dry-run pipeline.yml --output dry_run_results.csv

# Or a specific one
neurodags dry-run pipeline.yml --derivative MyDerivative --output dry_run_results.csv

Derivative Flags

Flag Default Description
save True Persist artifacts to disk. False = compute but don't write.
overwrite False Force recompute even if output exists.
for_dataframe False Include this derivative in build_derivative_dataframe.

Custom Node Definitions

Point new_definitions to one or more Python files:

new_definitions:
  - custom_nodes/my_nodes.py
  - /abs/path/to/other_nodes.py

Relative paths are resolved from the pipeline YAML location.

Documentation

https://yjmantilla.github.io/neurodags/

HDF5 / NetCDF Note

If you encounter RuntimeError: NetCDF: HDF error:

uv run pip install --no-binary=h5py h5py
# or without uv:
pip install --no-binary=h5py h5py

Contributing

See CONTRIBUTING.md.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurodags-0.2.0.tar.gz (83.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurodags-0.2.0-py3-none-any.whl (96.4 kB view details)

Uploaded Python 3

File details

Details for the file neurodags-0.2.0.tar.gz.

File metadata

  • Download URL: neurodags-0.2.0.tar.gz
  • Upload date:
  • Size: 83.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurodags-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dd126e776195594860ace431fb0613d409c6dfa78343386a71d777755bb8ec0b
MD5 68497081f491384381ecff652a0d6f6d
BLAKE2b-256 9fb4853a594da39f7146634222392e6673741ed2e5c59aca24c89ac9721ddb49

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurodags-0.2.0.tar.gz:

Publisher: publish.yml on yjmantilla/neurodags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neurodags-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: neurodags-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 96.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for neurodags-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ea35432605b27cba3a9377acc8959f0be5b7506778eadb51ab9e297068bb1da
MD5 244a1d8bff519091d918de3f2590c097
BLAKE2b-256 e5bb9a3b168ee63755d7ff6851c8646516bc7ff1aff3f8f0af40db5605bb2e6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurodags-0.2.0-py3-none-any.whl:

Publisher: publish.yml on yjmantilla/neurodags

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page