A slurm friendly MEEG derivative extraction package leveraging bids-like data organization and DAG processing.
Project description
NeuroDAGs
An Extensible and Declarative DAG Framework for Reproducible Neuroscience Workflows
M/EEG studies generate many interdependent intermediate derivatives. Recomputing full pipelines is wasteful; reusing valid intermediates is non-trivial. Large-scale studies require reproducible, extensible, and efficient workflows. NeuroDAGs addresses this with a declarative, graph-based framework for scalable and reusable derivative computation.
Docs | Comparison with Snakemake/Pydra | Poster BRaIN Symposium 2026 Montreal
Core Idea
Pipelines are defined as a directed acyclic graph (DAG) of computation nodes that output reusable derivatives, executed for each input file.
Design Principles
- Reproducible, transparent workflows defined declaratively in YAML — version-controllable and LLM-friendly.
- Uniform node abstraction — preprocessing, features, and any custom nodes are treated identically.
- Directory-agnostic — outputs mirror inputs' organization. Derivatives are labeled with a
@DerivativeNamesuffix. - xarray-centered outputs — derivatives stored as language-agnostic, metadata-rich, dimension-aware xarray → NetCDF.
- Graph-based reuse — if a derivative is already computed and
overwrite=False, it is skipped automatically.
Features
- Agnostic to data organization / directory hierarchy
- SLURM / HPC friendly with file-level parallelism via joblib
- Graph-based caching: skip already-computed derivatives
- Extensible node system — add nodes without forking the package
- YAML-based declarative configuration
- Unified CLI:
neurodags run,dry-run,dataframe,dag,view,validate,tui - Built-in Terminal User Interface (TUI) for pipeline management and execution
- Built-in nodes for preprocessing, spectral analysis, entropy, complexity, and data transformations
- Dataframe assembly (wide or long format) from derivative artifacts
- Dry-run mode — inspect planned computations without executing
- Built-in Dash-Plotly explorer for
.fifand.ncfiles
Installation
pip install neurodags
# Or with TUI support
pip install neurodags[tui]
With uv (recommended):
uv add neurodags
# Or with TUI support
uv add neurodags[tui]
Quickstart
See the quickstart example — full synthetic pipeline, no real data required.
CLI
NeuroDAGs installs a unified neurodags command:
neurodags validate pipeline.yml
neurodags run pipeline.yml # all derivatives in DerivativeList
neurodags run pipeline.yml --derivative CleanedEEG # or a specific one
neurodags dry-run pipeline.yml --output dry_run.csv
neurodags dataframe pipeline.yml --format wide --output features.csv
neurodags dag pipeline.yml --html pipeline_dag.html
neurodags view path/to/file.nc
If you install the optional TUI extra, you also get:
neurodags tui pipeline.yml --datasets datasets.yml
Development
git clone https://github.com/yjmantilla/neurodags
cd neurodags
uv sync --all-extras --all-groups # creates .venv and installs all deps incl. dev/test/docs
uv run pre-commit install
Key commands (all via uv run):
uv run ruff check src/ # lint (fix: uv run ruff check src/ --fix)
uv run black --check . # format check (fix: uv run black .)
uv run pytest -q # run tests
uv run pytest -s -q --no-cov --pdb # debug a failing test
uv run sphinx-build -b html docs docs/_build/html -W --keep-going # build docs
rm -rf docs/_build # clean docs
No uv? Install it with
pip install uvorcurl -Ls https://astral.sh/uv/install.sh | sh. All commands above work with plainpython/piptoo — swapuv run→ activate.venv,uv sync→pip install -e .[dev,test,docs].
Project Structure
my_project/
├── datasets.yml # Dataset sources and paths
├── pipeline.yml # Derivative definitions and execution list
└── custom_nodes.py # Optional custom node definitions
Quick Example
datasets.yml
my_dataset:
name: MyDataset
file_pattern:
local: data/**/*.vhdr
hpc: /cluster/BIDS/**/*.vhdr
derivatives_path:
local: outputs/
hpc: /cluster/scratch/out
pipeline.yml
datasets: datasets.yml
mount_point: local
new_definitions: custom_nodes.py # optional
DerivativeDefinitions:
CleanedEEG:
nodes:
- id: 0
derivative: SourceFile
- id: 1
node: basic_preprocessing
args:
mne_object: id.0
resample: 256
filter_args:
l_freq: 0.5
h_freq: 110
PowerSpectrum:
for_dataframe: True
nodes:
- id: 0
derivative: CleanedEEG.fif
- id: 1
node: mne_spectrum_array
args:
meeg: id.0
method: multitaper
DerivativeList:
- CleanedEEG
- PowerSpectrum
Python
from neurodags.loaders import load_configuration
from neurodags.orchestrators import run_pipeline
config = load_configuration("pipeline.yml")
# Run all derivatives in "DerivativeList", auto-sorted by dependency order
run_pipeline(config)
# Or run specific ones (also sorted by dependency order)
run_pipeline(config, derivatives=["CleanedEEG"])
CLI
neurodags validate pipeline.yml
# Run all derivatives in DerivativeList (dependency-sorted)
neurodags run pipeline.yml
# Or run specific ones
neurodags run pipeline.yml --derivative CleanedEEG
Custom Nodes
Add nodes without modifying or forking the package:
# custom_nodes.py
from neurodags.nodes import register_node
from neurodags.definitions import Artifact, NodeResult
@register_node
def my_node(data) -> NodeResult:
result = compute(data)
return NodeResult(
artifacts={
".nc": Artifact(
item=result,
writer=lambda path: result.to_netcdf(path),
),
},
)
Key rules:
- A node is a function decorated with
@register_node. - It returns a
NodeResult. - A
NodeResultcontainsartifacts— a dict mapping file extension toArtifact(item, writer).
Dataframe Assembly
from neurodags.orchestrators import build_derivative_dataframe
df = build_derivative_dataframe("pipeline.yml", output_format="wide")
Derivatives marked for_dataframe: True are collected automatically. Supports "wide" (one row per file) and "long" (one row per value) formats.
CLI equivalent:
neurodags dataframe pipeline.yml --format wide --output derivative_dataframe.csv
Parallel Execution
# pipeline.yml
n_jobs: 4 # -1 = all cores, 1 or null = serial
joblib_backend: loky
joblib_prefer: processes
Or via Python:
run_pipeline(config, derivatives=["MyDerivative"], n_jobs=4)
Or via CLI:
neurodags run pipeline.yml --derivative MyDerivative --n-jobs 4
Visualization
neurodags view path/to/file.fif
neurodags view path/to/file.nc
# Alternative module entry point
python -m neurodags.visualization path/to/file.fif
python -m neurodags.visualization path/to/file.nc
Built-in Dash-Plotly explorer with dimension-aware UI — dropdown per axis, plot types: Line, Scatter, Bar, Heatmap.
Inspection (Dry Run)
# All derivatives in DerivativeList
run_pipeline(config, dry_run=True)
# Or a specific one
run_pipeline(config, derivatives=["MyDerivative"], dry_run=True)
Returns a dataframe describing the execution plan without running any nodes. When a node fails, a .error marker file is written with the error message — failed files are retried on the next run. If a retry succeeds, the .error marker is automatically removed.
CLI equivalent:
# All derivatives in DerivativeList
neurodags dry-run pipeline.yml --output dry_run_results.csv
# Or a specific one
neurodags dry-run pipeline.yml --derivative MyDerivative --output dry_run_results.csv
Derivative Flags
| Flag | Default | Description |
|---|---|---|
save |
True |
Persist artifacts to disk. False = compute but don't write. |
overwrite |
False |
Force recompute even if output exists. |
for_dataframe |
False |
Include this derivative in build_derivative_dataframe. |
Custom Node Definitions
Point new_definitions to one or more Python files:
new_definitions:
- custom_nodes/my_nodes.py
- /abs/path/to/other_nodes.py
Relative paths are resolved from the pipeline YAML location.
Documentation
https://yjmantilla.github.io/neurodags/
HDF5 / NetCDF Note
If you encounter RuntimeError: NetCDF: HDF error:
uv run pip install --no-binary=h5py h5py
# or without uv:
pip install --no-binary=h5py h5py
Contributing
See CONTRIBUTING.md.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neurodags-0.2.1.tar.gz.
File metadata
- Download URL: neurodags-0.2.1.tar.gz
- Upload date:
- Size: 83.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
683704d025a955e97d4b1b9754eef4e802df5d3ed90acd4a28158e3bf47b8b47
|
|
| MD5 |
2a69d4c2e9b5fda5c8954c2b108a73ea
|
|
| BLAKE2b-256 |
06b36fc8f39831975c712f454be902cb22b058b654780b9394405e31098712f5
|
Provenance
The following attestation bundles were made for neurodags-0.2.1.tar.gz:
Publisher:
publish.yml on yjmantilla/neurodags
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neurodags-0.2.1.tar.gz -
Subject digest:
683704d025a955e97d4b1b9754eef4e802df5d3ed90acd4a28158e3bf47b8b47 - Sigstore transparency entry: 1540933172
- Sigstore integration time:
-
Permalink:
yjmantilla/neurodags@3d01da66dfa9d1fd745b6767dc9565b19048ec59 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/yjmantilla
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d01da66dfa9d1fd745b6767dc9565b19048ec59 -
Trigger Event:
push
-
Statement type:
File details
Details for the file neurodags-0.2.1-py3-none-any.whl.
File metadata
- Download URL: neurodags-0.2.1-py3-none-any.whl
- Upload date:
- Size: 96.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
334f8e2705e46ec1e9e044cbf3b8adeb6eba44b2f8344d345d55384ddf40fa89
|
|
| MD5 |
caff9f3e50e21f5a2a0e58620ec4ce10
|
|
| BLAKE2b-256 |
e313a593f10425cdb882ff1bf2a3fd84aa580bd138d866b1cfa0ebdf6d24c0fd
|
Provenance
The following attestation bundles were made for neurodags-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on yjmantilla/neurodags
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neurodags-0.2.1-py3-none-any.whl -
Subject digest:
334f8e2705e46ec1e9e044cbf3b8adeb6eba44b2f8344d345d55384ddf40fa89 - Sigstore transparency entry: 1540933273
- Sigstore integration time:
-
Permalink:
yjmantilla/neurodags@3d01da66dfa9d1fd745b6767dc9565b19048ec59 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/yjmantilla
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d01da66dfa9d1fd745b6767dc9565b19048ec59 -
Trigger Event:
push
-
Statement type: