Automated BIDS standardization tool powered by LLM-first architecture
Project description
autobidsify
Automated Brain Imaging Data Structure (BIDS) standardization tool powered by LLM-first architecture.
Features
- General compatibility: Handles diverse dataset structures (flat, hierarchical, multi-site)
- Multi-modal support: MRI, fNIRS, and mixed modality datasets
- Intelligent metadata extraction: Automatic participant demographics from DICOM headers, documents, and filenames
- Format conversion: DICOM→NIfTI, JNIfTI→NIfTI, .mat/.nirs→SNIRF, and more
- Multi-LLM support: OpenAI (gpt-4o, gpt-5.1, o1, o3) and Qwen (via Ollama or DashScope)
- Evidence-based reasoning: Confidence scoring and provenance tracking for all decisions
Supported Formats
Input formats:
- MRI: DICOM (.dcm), NIfTI (.nii, .nii.gz), JNIfTI (.jnii, .bnii)
- fNIRS: SNIRF (.snirf), Homer3 (.nirs), MATLAB (.mat)
- Documents: PDF, DOCX, TXT, Markdown
Output: Compliant to BIDS specification (v1.10.0)
Installation
pip install autobidsify
Optional dependencies:
# For DICOM conversion
apt-get install dcm2niix # Ubuntu/Debian
brew install dcm2niix # macOS
# For BIDS validation
npm install -g bids-validator
# For Qwen models (local)
# Install Ollama from https://ollama.com/download
ollama pull qwen2.5-coder:7b
pip install ollama
Set API key:
# OpenAI
export OPENAI_API_KEY="your-key-here"
# Qwen via DashScope (optional cloud alternative to Ollama)
export DASHSCOPE_API_KEY="your-key-here"
Quick Start
# Full pipeline (one command)
# With dataset description (recommended for better metadata extraction)
autobidsify full \
--input /path/to/your/data \
--output outputs/my_dataset \
--model gpt-4o \
--modality mri \
--nsubjects 10 \
--id-strategy auto \
--describe "Your dataset description here"
# Step-by-step execution
autobidsify ingest --input data/ --output outputs/run
autobidsify evidence --output outputs/run --modality mri
autobidsify trio --output outputs/run --model gpt-4o
autobidsify plan --output outputs/run --model gpt-4o
autobidsify execute --output outputs/run
autobidsify validate --output outputs/run
Command Options
--input PATH Input data (archive or directory)
--output PATH Output directory
--model MODEL LLM model (default: gpt-4o)
--modality TYPE Data modality: mri | nirs | mixed
--nsubjects N Number of subjects (optional, auto-detected if omitted)
--describe "TEXT" Dataset description (recommended for metadata accuracy)
--id-strategy STRATEGY Subject ID strategy: auto | numeric | semantic (default: auto)
Supported Models
OpenAI:
--model gpt-4o # Highly recommended, stable
--model gpt-4o-mini # Faster, cheaper
--model gpt-5.1 # Not that ecommended, latest
Qwen (via Ollama, local):
--model qwen3-coder-next:latest # Recommended
--model qwen3-coder-careful:latest # Recommended
--model qwen2.5-coder:7b # Not recommended, slow and sometimes inaccurate,
Qwen (via rest-api):
export OLLAMA_BASE_URL=http://your-server.com:xxxx
Pipeline Stages
| Stage | Command | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | ingest |
Raw data | ingest_info.json |
Extract/reference data |
| 2 | evidence |
All files | evidence_bundle.json |
Analyze structure, detect subjects |
| 3 | classify |
Mixed data | classification_plan.json |
Separate MRI/fNIRS (optional) |
| 4 | trio |
Evidence | BIDS trio files | Generate metadata files |
| 5 | plan |
Evidence + trio | BIDSPlan.yaml |
Create conversion strategy |
| 6 | execute |
Plan | bids_compatible/ |
Execute conversions |
| 7 | validate |
BIDS dataset | Validation report | Check compliance |
Output Structure
outputs/my_dataset/
├── bids_compatible/ # Final BIDS dataset
│ ├── dataset_description.json
│ ├── README.md
│ ├── participants.tsv
│ ├── sub-001/
│ │ ├── anat/
│ │ │ └── sub-001_T1w.nii.gz
│ │ └── func/
│ │ └── sub-001_task-rest_bold.nii.gz
│ └── derivatives/ # Unprocessed files (original structure)
│ └── sub-001/
│ └── ...
└── _staging/ # Intermediate files
├── evidence_bundle.json
├── BIDSPlan.yaml
└── conversion_log.json
Examples
Example 1: Single-site MRI study
autobidsify full \
--input brain_scans/ \
--output outputs/study1 \
--nsubjects 50 \
--model gpt-4o \
--modality mri
--id-strategy auto \
--describe "Single-site MRI study"
Example 2: Multi-site dataset with description
autobidsify full \
--input camcan_data/ \
--output outputs/camcan \
--model gpt-4o \
--modality mri \
--id-strategy semantic \
--describe "Multi-site dataset with description"
Example 3: fNIRS dataset using Qwen (local, no API cost)
autobidsify full \
--input fnirs_study/ \
--output outputs/fnirs \
--model qwen3-coder-next:latest \
--modality nirs \
--id-strategy auto \
--describe "fNIRS dataset"
Architecture
LLM-First Design:
- Python: Deterministic operations — file I/O, regex-based subject detection, format conversion, BIDS validation
- LLM: Semantic understanding — dataset description, metadata extraction, scan type classification, license normalization
- Hybrid: Python analyzes ALL files for completeness; LLM sees representative samples for semantic decisions
Requirements
- Python
- OpenAI API key (or Ollama for local Qwen models)
dcm2niixfor DICOM conversionbids-validatorfor validation
Current Status
Version: 0.6.1
Tested datasets:
- Visible Human Project (flat structure, DICOM CT)
- CamCAN (hierarchical, multi-site, 30+ subjects)
- FRESH-Motor (fNIRS, existing BIDS format)
- fNIRS tinnitus dataset (flat structure, .nirs files)
Known limitations:
- Mixed modality classification (Stage 3) is experimental
- .mat fNIRS conversion assumes Homer3-compatible variable naming
Contributing
We need YOUR datasets to improve robustness. Please test and report issues at: https://github.com/cotilab/autobidsify/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autobidsify-0.6.1.tar.gz.
File metadata
- Download URL: autobidsify-0.6.1.tar.gz
- Upload date:
- Size: 95.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dcd7d859183347dc1790e132011b1e358c3018cd67b810ca4c2bc666516fcfa7
|
|
| MD5 |
58a5bb95bd0ef7db0a4b823dc96632e5
|
|
| BLAKE2b-256 |
d7cf3464506091896f640bbc2ebd2a5ebe39d4fd49125f43415f37bb30a75ffd
|
File details
Details for the file autobidsify-0.6.1-py3-none-any.whl.
File metadata
- Download URL: autobidsify-0.6.1-py3-none-any.whl
- Upload date:
- Size: 90.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70f69149e07a7da7ae12e0dc313169caa8aa472dbe6de03bdca722330614fc06
|
|
| MD5 |
0a70977a9d5d93702abd85deed73f227
|
|
| BLAKE2b-256 |
6c0c83873778cb151a2c927ee089986ccb38bfff46b855787592faae7ab5865e
|