Automated BIDS standardization tool powered by LLM-first architecture
Project description
autobidsify
Automated Brain Imaging Data Structure (BIDS) standardization tool powered by LLM-first architecture.
Features
- General compatibility: Handles diverse dataset structures (flat, hierarchical, multi-site)
- Multi-modal support: MRI, fNIRS, and mixed modality datasets
- Intelligent metadata extraction: Automatic participant demographics from DICOM headers, documents, and filenames
- Format conversion: DICOM→NIfTI, JNIfTI→NIfTI, .mat/.nirs→SNIRF, and more
- Multi-LLM support: OpenAI (gpt-4o, gpt-5.1) and Qwen (via Ollama locally or with rest-api or DashScope)
- Evidence-based reasoning: Confidence scoring and provenance tracking for all decisions
Supported Formats
Input formats:
- MRI: DICOM (.dcm), NIfTI (.nii, .nii.gz), JNIfTI (.jnii, .bnii)
- fNIRS: SNIRF (.snirf), Homer3 (.nirs), MATLAB (.mat)
- Documents: PDF, DOCX, TXT, Markdown
Output: Compliant to BIDS specification (v1.10.0)
Installation
pip install autobidsify
Optional dependencies:
# For BIDS validation
npm install -g bids-validator
Set API key:
# OpenAI
export OPENAI_API_KEY="your-key-here"
# Qwen via DashScope (optional cloud alternative to Ollama)
export DASHSCOPE_API_KEY="your-key-here"
Quick Start
# Full pipeline (one command)
# With dataset description (recommended for better metadata extraction)
autobidsify full \
--input /path/to/your/data \
--output outputs/my_dataset \
--model gpt-4o \
--modality mri \
--nsubjects 10 \
--id-strategy auto \
--describe "Your dataset description here"
# Step-by-step execution
autobidsify ingest --input data/ --output outputs/run
autobidsify evidence --output outputs/run --modality mri
autobidsify trio --output outputs/run --model gpt-4o
autobidsify plan --output outputs/run --model gpt-4o
autobidsify execute --output outputs/run
autobidsify validate --output outputs/run
Command Options
--input PATH Input data (archive or directory)
--output PATH Output directory
--model MODEL LLM model (default: gpt-4o)
--modality TYPE Data modality: mri | nirs | mixed
--nsubjects N Number of subjects (optional, auto-detected if omitted)
--describe "TEXT" Dataset description (recommended for metadata accuracy)
--id-strategy STRATEGY Subject ID strategy: auto | numeric | semantic (default: auto)
Supported Models
OpenAI:
--model gpt-4o # Highly recommended, stable
--model gpt-4o-mini # Faster, cheaper
--model gpt-5.1 # Not that ecommended, latest
Qwen (via Ollama, local):
--model qwen3-coder-next:latest # Recommended
--model qwen3-coder-careful:latest # Recommended
--model qwen2.5-coder:7b # Not recommended, slow and sometimes inaccurate,
Qwen (via rest-api):
export OLLAMA_BASE_URL=http://your-server.com:xxxx
Pipeline Stages
| Stage | Command | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | ingest |
Raw data | ingest_info.json |
Extract/reference data |
| 2 | evidence |
All files | evidence_bundle.json |
Analyze structure, detect subjects |
| 3 | classify |
Mixed data | classification_plan.json, nirs_pool/, mri_pool/, unknown/ |
Separate MRI/fNIRS (optional) |
| 4 | trio |
Evidence | BIDS trio files | Generate metadata files |
| 5 | plan |
Evidence + trio | BIDSPlan.yaml, subject_analysis.json |
Create conversion strategy |
| 6 | execute |
Plan | bids_compatible/, coversion_log.json, BIDSManifest.yaml |
Execute conversions |
| 7 | validate |
BIDS dataset | Validation report | Check compliance |
Output Structure
outputs/my_dataset/
├── bids_compatible/ # Final BIDS dataset
│ ├── dataset_description.json
│ ├── README.md
│ ├── participants.tsv
│ ├── sub-001/
│ │ ├── anat/
│ │ │ └── sub-001_T1w.nii.gz
│ │ └── func/
│ │ └── sub-001_task-rest_bold.nii.gz
│ └── derivatives/ # Unprocessed files (original structure)
│ └── sub-001/
│ └── ...
└── _staging/ # Intermediate files
├── evidence_bundle.json
├── BIDSPlan.yaml
└── conversion_log.json
Architecture
LLM-First Design:
- Python: Deterministic operations — file I/O, regex-based subject detection, format conversion, BIDS validation
- LLM: Semantic understanding — dataset description, metadata extraction, scan type classification, license normalization
- Hybrid: Python analyzes ALL files for completeness; LLM sees representative samples for semantic decisions
Requirements
- Python
- OpenAI API key (or Ollama for local Qwen models)
bids-validatorfor validation
Current Status
Version: 0.8.6
Tested datasets:
- Visible Human Project (flat structure, DICOM CT)
- CamCAN (hierarchical, multi-site, 30+ subjects)
- FRESH-Motor (fNIRS, existing BIDS format)
- fNIRS tinnitus dataset (flat structure, .nirs files)
Known limitations:
- Mixed modality classification (Stage 3) is experimental
- .mat fNIRS conversion assumes Homer3-compatible variable naming
Contributing
We need YOUR datasets to improve robustness. Please test and report issues at: https://github.com/cotilab/autobidsify/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autobidsify-0.8.6.tar.gz.
File metadata
- Download URL: autobidsify-0.8.6.tar.gz
- Upload date:
- Size: 100.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d907f25d34b2a04ee9ae892a193645780cb25ec36ca18dcc8e4248192cbda387
|
|
| MD5 |
8179f99e7e5e6968c6a44f8914ed404e
|
|
| BLAKE2b-256 |
5426858c88718a3e4779a8f4e117fd1f49a513a94d2f5d3e759375ad01a0dc36
|
File details
Details for the file autobidsify-0.8.6-py3-none-any.whl.
File metadata
- Download URL: autobidsify-0.8.6-py3-none-any.whl
- Upload date:
- Size: 92.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b02e9ad9d032b2ba3fc88c57bed6b5c9aab57f3b977b3c06e9ee93f9c39ce0c
|
|
| MD5 |
bc2571e69fbd6b6e033646c0589bdacc
|
|
| BLAKE2b-256 |
66ae3181028eadc86df1e7a69b49047d57e324b3016dacb1df6cba2a9db1aca4
|