Automated BIDS standardization tool powered by LLM-first architecture
Project description
autobidsify
Automated Brain Imaging Data Structure (BIDS) standardization tool powered by LLM-first architecture.
Features
- General compatibility: Handles diverse dataset structures (flat, hierarchical, multi-site)
- Multi-modal support: MRI, fNIRS, EEG, and mixed modality datasets
- Intelligent metadata extraction: Automatic participant demographics from DICOM headers, documents, and filenames
- Format conversion: DICOM→NIfTI, JNIfTI→NIfTI, .mat/.nirs→SNIRF, and more
- Multi-LLM support: OpenAI (gpt-4o, gpt-5.1) and Qwen (via Ollama locally, REST API, or DashScope)
- Evidence-based reasoning: Confidence scoring and provenance tracking for all decisions
Supported Formats
Input formats:
- MRI: DICOM (.dcm), NIfTI (.nii, .nii.gz), JNIfTI (.jnii, .bnii)
- fNIRS: SNIRF (.snirf), Homer3 (.nirs), MATLAB (.mat)
- EEG: EDF/EDF+ (.edf), BrainVision (.vhdr), EEGLAB (.set), Biosemi (.bdf)
- Documents: PDF, DOCX, TXT, Markdown
Output: Compliant to BIDS specification (v1.10.0)
Installation
pip install autobidsify
Optional dependencies:
# For BIDS validation
npm install -g bids-validator
# For DICOM conversion
pip install dcm2niix # or: apt-get install dcm2niix / brew install dcm2niix
Set API key:
# OpenAI
export OPENAI_API_KEY="your-key-here"
# Qwen via DashScope (optional cloud alternative to Ollama)
export DASHSCOPE_API_KEY="your-key-here"
Run all testing datasets:
./run_all_tests.sh
Quick Start
# Full pipeline (one command)
autobidsify full \
--input /path/to/your/data \
--output outputs/my_dataset \
--model gpt-4o \
--modality mri \
--nsubjects 10 \
--id-strategy auto \
--describe "Your dataset description here"
# Step-by-step execution
autobidsify ingest --input data/ --output outputs/run
autobidsify evidence --output outputs/run --modality mri
autobidsify trio --output outputs/run --model gpt-4o
autobidsify plan --output outputs/run --model gpt-4o
autobidsify execute --output outputs/run
autobidsify validate --output outputs/run
Command Options
--input PATH Input data (archive or directory)
--output PATH Output directory
--model MODEL LLM model (default: gpt-4o)
--modality TYPE Data modality: mri | nirs | eeg | mixed
--nsubjects N Number of subjects (optional, auto-detected if omitted)
--describe "TEXT" Dataset description (recommended for metadata accuracy)
--id-strategy STRATEGY Subject ID strategy: auto | numeric | semantic (default: auto)
Supported Models
OpenAI:
--model gpt-4o # Recommended, stable
--model gpt-4o-mini # Faster, cheaper
--model gpt-5.1 # Latest
Qwen (via local Ollama):
--model qwen3-coder-next:latest # Recommended
--model qwen3-coder-careful:latest # Recommended
--model qwen2.5-coder:7b # Not recommended, slow and sometimes inaccurate
Qwen (via remote Ollama REST API):
export OLLAMA_BASE_URL=http://your-server.com:xxxx
--model qwen3-coder-next:latest
Qwen (via DashScope cloud API):
export DASHSCOPE_API_KEY="your-key-here"
--model qwen-max
Pipeline Stages
| Stage | Command | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | ingest |
Raw data | ingest_info.json |
Extract/reference data |
| 2 | evidence |
All files | evidence_bundle.json |
Analyze structure, detect subjects, scan auxiliary files |
| 3 | classify |
Mixed data | classification_plan.json, pool directories |
Separate MRI/fNIRS/EEG (optional, mixed only) |
| 4 | trio |
Evidence | BIDS trio files | Generate dataset_description.json, README, participants.tsv |
| 5 | plan |
Evidence + trio | BIDSPlan.yaml |
Create conversion strategy, generate modality-specific mappings |
| 6 | execute |
Plan | bids_compatible/, conversion_log.json, BIDSManifest.yaml |
Execute conversions, generate BIDS sidecars |
| 7 | validate |
BIDS dataset | Validation report | Check compliance (Tier 1: Python bids_validator, Tier 2: npm bids-validator) |
Output Structure
outputs/my_dataset/
├── bids_compatible/ # Final BIDS dataset
│ ├── dataset_description.json
│ ├── README.md
│ ├── participants.tsv
│ ├── sub-001/
│ │ ├── anat/
│ │ │ └── sub-001_T1w.nii.gz
│ │ ├── func/
│ │ │ └── sub-001_task-rest_bold.nii.gz
│ │ ├── nirs/
│ │ │ ├── sub-001_task-rest_nirs.snirf
│ │ │ └── sub-001_task-rest_nirs.json
│ │ └── eeg/
│ │ ├── sub-001_task-rest_eeg.edf
│ │ ├── sub-001_task-rest_eeg.json
│ │ ├── sub-001_task-rest_channels.tsv
│ │ ├── sub-001_optodes.tsv # fNIRS only
│ │ ├── sub-001_electrodes.tsv # EEG only
│ │ └── sub-001_coordsystem.json
│ └── derivatives/ # Unprocessed files (original structure)
└── _staging/ # Intermediate files
├── evidence_bundle.json
├── BIDSPlan.yaml
├── mat_mapping.json # fNIRS .mat datasets only
├── eeg_event_mapping.json # EEG datasets with event files
├── eeg_aux_mapping.json # EEG datasets with auxiliary metadata
└── conversion_log.json
Examples
MRI dataset
autobidsify full \
--input brain_scans/ \
--output outputs/study1 \
--model gpt-4o \
--modality mri \
--nsubjects 30 \
--id-strategy numeric \
--describe "Single-site T1w MRI study, 30 healthy adults"
fNIRS dataset
autobidsify full \
--input fnirs_data/ \
--output outputs/fnirs \
--model gpt-4o \
--modality nirs \
--describe "Prefrontal fNIRS, 20 subjects, resting state and finger tapping"
EEG dataset
autobidsify full \
--input eeg_data/ \
--output outputs/eeg \
--model gpt-4o \
--modality eeg \
--nsubjects 36 \
--describe "EEG during mental arithmetic tasks, 36 subjects, EDF format"
Using Qwen (local, no API cost)
ollama serve
autobidsify full \
--input data/ \
--output outputs/run \
--model qwen3-coder-next:latest \
--modality mri
Architecture
LLM-First Design:
- Python: Deterministic operations — file I/O, regex-based subject detection, format conversion, BIDS validation, standard 10-20 electrode lookup
- LLM: Semantic understanding — dataset description, metadata extraction, scan type classification, license normalization, event file column mapping, auxiliary file analysis
- Hybrid: Python analyzes ALL files for completeness; LLM sees representative samples for semantic decisions
Requirements
- Python 3.10+
- OpenAI API key (or Ollama for local Qwen models)
bids-validator(npm) for full structural validation (optional)dcm2niixfor DICOM conversion (optional)
Current Status
Version: 0.9.5
Contributing
We need YOUR datasets to improve robustness. Please test and report issues at: https://github.com/cotilab/autobidsify/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autobidsify-0.9.5.tar.gz.
File metadata
- Download URL: autobidsify-0.9.5.tar.gz
- Upload date:
- Size: 133.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36e223cf601612544310e9c038ee6e29a2c12b06b0aad733a7a0b7cd1fa3b0da
|
|
| MD5 |
dd79e7e3bf25febd98d25c8c42e65e62
|
|
| BLAKE2b-256 |
b78def5960d358e0c156b2e59a461cef0f49a23bc7772ffa5c6aab9bcbc018c0
|
File details
Details for the file autobidsify-0.9.5-py3-none-any.whl.
File metadata
- Download URL: autobidsify-0.9.5-py3-none-any.whl
- Upload date:
- Size: 128.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16f230ffab3e0f8ec3c539a0f47d5c659819c3e09c87283b474e934d1c720dff
|
|
| MD5 |
8263c8565c7c093b4d79d5d563d614bc
|
|
| BLAKE2b-256 |
e5c91d6f547f56917264649f6f663249417a368ef30c17299694eb790f2dc44e
|