Automated BIDS standardization tool powered by LLM-first architecture

These details have not been verified by PyPI

Project links

Project description

autobidsify

Automated Brain Imaging Data Structure (BIDS) standardization tool powered by LLM-first architecture.

Features

General compatibility: Handles diverse dataset structures (flat, hierarchical, multi-site)
Multi-modal support: MRI, fNIRS, and mixed modality datasets
Intelligent metadata extraction: Automatic participant demographics from DICOM headers, documents, and filenames
Format conversion: DICOM→NIfTI, JNIfTI→NIfTI, .mat/.nirs→SNIRF, and more
Multi-LLM support: OpenAI (gpt-4o, gpt-5.1, o1, o3) and Qwen (via Ollama or DashScope)
Evidence-based reasoning: Confidence scoring and provenance tracking for all decisions

Supported Formats

Input formats:

MRI: DICOM (.dcm), NIfTI (.nii, .nii.gz), JNIfTI (.jnii, .bnii)
fNIRS: SNIRF (.snirf), Homer3 (.nirs), MATLAB (.mat)
Documents: PDF, DOCX, TXT, Markdown

Output: Compliant to BIDS specification (v1.10.0)

Installation

pip install autobidsify

Optional dependencies:

# For DICOM conversion
apt-get install dcm2niix          # Ubuntu/Debian
brew install dcm2niix             # macOS

# For BIDS validation
npm install -g bids-validator

# For Qwen models (local)
# Install Ollama from https://ollama.com/download
ollama pull qwen2.5-coder:7b
pip install ollama

Set API key:

# OpenAI
export OPENAI_API_KEY="your-key-here"

# Qwen via DashScope (optional cloud alternative to Ollama)
export DASHSCOPE_API_KEY="your-key-here"

Quick Start

# Full pipeline (one command)
# With dataset description (recommended for better metadata extraction)
autobidsify full \
  --input /path/to/your/data \
  --output outputs/my_dataset \
  --model gpt-4o \
  --modality mri \
  --nsubjects 10 \
  --id-strategy auto \
  --describe "Your dataset description here"

# Step-by-step execution
autobidsify ingest  --input data/ --output outputs/run
autobidsify evidence --output outputs/run --modality mri
autobidsify trio   --output outputs/run --model gpt-4o
autobidsify plan   --output outputs/run --model gpt-4o
autobidsify execute  --output outputs/run
autobidsify validate --output outputs/run

Command Options

--input PATH            Input data (archive or directory)
--output PATH           Output directory
--model MODEL           LLM model (default: gpt-4o)
--modality TYPE         Data modality: mri | nirs | mixed
--nsubjects N           Number of subjects (optional, auto-detected if omitted)
--describe "TEXT"       Dataset description (recommended for metadata accuracy)
--id-strategy STRATEGY  Subject ID strategy: auto | numeric | semantic (default: auto)

Supported Models

OpenAI:

--model gpt-4o           # Highly recommended, stable
--model gpt-4o-mini      # Faster, cheaper
--model gpt-5.1          # Not that ecommended, latest

Qwen (via Ollama, local):

--model qwen3-coder-next:latest     # Recommended
--model qwen3-coder-careful:latest  # Recommended
--model qwen2.5-coder:7b            # Not recommended, slow and sometimes inaccurate,

Qwen (via rest-api):

export OLLAMA_BASE_URL=http://your-server.com:xxxx

Pipeline Stages

Stage	Command	Input	Output	Purpose
1	`ingest`	Raw data	`ingest_info.json`	Extract/reference data
2	`evidence`	All files	`evidence_bundle.json`	Analyze structure, detect subjects
3	`classify`	Mixed data	`classification_plan.json`	Separate MRI/fNIRS (optional)
4	`trio`	Evidence	BIDS trio files	Generate metadata files
5	`plan`	Evidence + trio	`BIDSPlan.yaml`	Create conversion strategy
6	`execute`	Plan	`bids_compatible/`	Execute conversions
7	`validate`	BIDS dataset	Validation report	Check compliance

Output Structure

outputs/my_dataset/
├── bids_compatible/              # Final BIDS dataset
│   ├── dataset_description.json
│   ├── README.md
│   ├── participants.tsv
│   ├── sub-001/
│   │   ├── anat/
│   │   │   └── sub-001_T1w.nii.gz
│   │   └── func/
│   │       └── sub-001_task-rest_bold.nii.gz
│   └── derivatives/              # Unprocessed files (original structure)
│       └── sub-001/
│           └── ...
└── _staging/                     # Intermediate files
    ├── evidence_bundle.json
    ├── BIDSPlan.yaml
    └── conversion_log.json

Examples

Example 1: Single-site MRI study

autobidsify full \
  --input brain_scans/ \
  --output outputs/study1 \
  --nsubjects 50 \
  --model gpt-4o \
  --modality mri
  --id-strategy auto \
  --describe "Single-site MRI study"

Example 2: Multi-site dataset with description

autobidsify full \
  --input camcan_data/ \
  --output outputs/camcan \
  --model gpt-4o \
  --modality mri \
  --id-strategy semantic \
  --describe "Multi-site dataset with description"

Example 3: fNIRS dataset using Qwen (local, no API cost)

autobidsify full \
  --input fnirs_study/ \
  --output outputs/fnirs \
  --model qwen3-coder-next:latest \
  --modality nirs \
  --id-strategy auto \
  --describe "fNIRS dataset"

Architecture

LLM-First Design:

Python: Deterministic operations — file I/O, regex-based subject detection, format conversion, BIDS validation
LLM: Semantic understanding — dataset description, metadata extraction, scan type classification, license normalization
Hybrid: Python analyzes ALL files for completeness; LLM sees representative samples for semantic decisions

Requirements

Python
OpenAI API key (or Ollama for local Qwen models)
dcm2niix for DICOM conversion
bids-validator for validation

Current Status

Version: 0.6.1

Tested datasets:

Visible Human Project (flat structure, DICOM CT)
CamCAN (hierarchical, multi-site, 30+ subjects)
FRESH-Motor (fNIRS, existing BIDS format)
fNIRS tinnitus dataset (flat structure, .nirs files)

Known limitations:

Mixed modality classification (Stage 3) is experimental
.mat fNIRS conversion assumes Homer3-compatible variable naming

Contributing

We need YOUR datasets to improve robustness. Please test and report issues at: https://github.com/cotilab/autobidsify/issues

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.1

Apr 1, 2026

0.9.0

Mar 31, 2026

0.8.6

Mar 24, 2026

0.8.5

Mar 24, 2026

0.8.0

Mar 19, 2026

0.7.0

Mar 18, 2026

0.6.2

Mar 14, 2026

This version

0.6.1

Mar 13, 2026

0.6.0

Mar 13, 2026

0.5.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autobidsify-0.6.1.tar.gz (95.9 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autobidsify-0.6.1-py3-none-any.whl (90.1 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file autobidsify-0.6.1.tar.gz.

File metadata

Download URL: autobidsify-0.6.1.tar.gz
Upload date: Mar 13, 2026
Size: 95.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autobidsify-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`dcd7d859183347dc1790e132011b1e358c3018cd67b810ca4c2bc666516fcfa7`
MD5	`58a5bb95bd0ef7db0a4b823dc96632e5`
BLAKE2b-256	`d7cf3464506091896f640bbc2ebd2a5ebe39d4fd49125f43415f37bb30a75ffd`

See more details on using hashes here.

File details

Details for the file autobidsify-0.6.1-py3-none-any.whl.

File metadata

Download URL: autobidsify-0.6.1-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 90.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autobidsify-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70f69149e07a7da7ae12e0dc313169caa8aa472dbe6de03bdca722330614fc06`
MD5	`0a70977a9d5d93702abd85deed73f227`
BLAKE2b-256	`6c0c83873778cb151a2c927ee089986ccb38bfff46b855787592faae7ab5865e`

See more details on using hashes here.

autobidsify 0.6.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

autobidsify

Features

Supported Formats

Installation

Quick Start

Command Options

Supported Models

Pipeline Stages

Output Structure

Examples

Example 1: Single-site MRI study

Example 2: Multi-site dataset with description

Example 3: fNIRS dataset using Qwen (local, no API cost)

Architecture

Requirements

Current Status

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes