Automated BIDS standardization tool powered by LLM-first architecture
Project description
auto-bidsify
Automated BIDS standardization tool powered by LLM-first architecture.
Features
- General compatibility: Handles diverse dataset structures (flat, hierarchical, multi-site)
- Multi-modal support: MRI, fNIRS, and mixed modality datasets
- Intelligent metadata extraction: Automatic participant demographics from DICOM headers, documents, and filenames
- Format conversion: DICOM→NIfTI, CSV→SNIRF, and more
- Evidence-based reasoning: Confidence scoring and provenance tracking for all decisions
Supported Formats
Input formats:
- MRI: DICOM, NIfTI (.nii, .nii.gz)
- fNIRS: SNIRF, Homer3 (.nirs), CSV/TSV tables
- Documents: PDF, DOCX, TXT, Markdown, ...
Output: BIDS-compliant dataset (v1.10.0)
Quick Start
Installation
# Clone repository
git clone https://github.com/yourusername/auto-bidsify.git
cd auto-bidsify
# Setup environment
conda create -n bidsify python=3.10
conda activate bidsify
pip install -r requirements.txt
# Set OpenAI API key
export OPENAI_API_KEY="your-key-here"
Basic Usage
# Full pipeline (one command)
python cli.py full \
--input /path/to/your/data \
--output outputs/my_dataset \
--model gpt-4o \
--modality mri
# Step-by-step execution
python cli.py ingest --input data.zip --output outputs/run
python cli.py evidence --output outputs/run --modality mri
python cli.py trio --output outputs/run --model gpt-4o
python cli.py plan --output outputs/run --model gpt-4o
python cli.py execute --output outputs/run
python cli.py validate --output outputs/run
Command Options
--input PATH Input data (archive or directory)
--output PATH Output directory
--model MODEL LLM model (default: gpt-4o)
--modality TYPE Data modality: mri|nirs|mixed
--nsubjects N Number of subjects (optional)
--describe "TEXT" Dataset description (recommended)
Pipeline Stages
| Stage | Command | Input | Output | Purpose |
|---|---|---|---|---|
| 1 | ingest |
Raw data | ingest_info.json |
Extract/reference data |
| 2 | evidence |
All files | evidence_bundle.json |
Analyze structure, detect subjects |
| 3 | classify |
Mixed data | classification_plan.json |
Separate MRI/fNIRS (optional) |
| 4 | trio |
Evidence | BIDS trio files | Generate metadata files |
| 5 | plan |
Evidence + trio | BIDSPlan.yaml |
Create conversion strategy |
| 6 | execute |
Plan | bids_compatible/ |
Execute conversions |
| 7 | validate |
BIDS dataset | Validation report | Check compliance |
Output Structure
outputs/my_dataset/
bids_compatible/ # Final BIDS dataset
dataset_description.json
README.md
participants.tsv
sub-001/
anat/
sub-001_T1w.nii.gz
func/
sub-001_task-rest_bold.nii.gz
_staging/ # Intermediate files
evidence_bundle.json
BIDSPlan.yaml
conversion_log.json
Examples
Example 1: Single-site MRI study
python cli.py full \
--input brain_scans/ \
--output outputs/study1 \
--nsubjects 50 \
--model gpt-4o \
--modality mri
Example 2: Multi-site dataset with description
python cli.py full \
--input camcan_data/ \
--output outputs/camcan \
--model gpt-4o \
--modality mri \
--describe "Cambridge Centre for Ageing and Neuroscience: 650 participants, ages 18-88, multi-site MRI study"
Example 3: fNIRS dataset from CSV
python cli.py full \
--input fnirs_study/ \
--output outputs/fnirs \
--model gpt-4o \
--modality nirs \
--describe "Prefrontal cortex activation during cognitive tasks, 30 subjects"
Architecture
LLM-First Design:
- Python: Deterministic operations (file I/O, format conversion, validation)
- LLM: Semantic understanding (file classification, metadata extraction, pattern recognition)
- Hybrid: Best of both worlds - reliability + flexibility
Requirements
- Python 3.10+
- OpenAI API key
- Optional:
dcm2niixfor DICOM conversion - Optional:
bids-validatorfor validation
Current Status
Version: 1.0 (LLM-First Architecture with Evidence-Based Reasoning)
Tested datasets:
- Visible Human Project (flat structure, CT scans)
- CamCAN (hierarchical, multi-site, 1288 subjects)
- [Your dataset here - help us test!]
Known limitations:
- Classification stage (Stage 3) and mat/spreadsheet conversion is experimental
- Some edge cases in participant metadata extraction
Contributing
We need YOUR datasets to improve robustness! Please test and report:
- Success cases
- Failure cases
- Edge cases
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autobidsify-0.5.0.tar.gz.
File metadata
- Download URL: autobidsify-0.5.0.tar.gz
- Upload date:
- Size: 67.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08788b709d32b4cedfadee60bbe16f9d67c5224b72ab6227d4100905f46d96aa
|
|
| MD5 |
af5ef462608b14b6ba1febca16c2e247
|
|
| BLAKE2b-256 |
49e899104c64cc129704241dc8b29a038aff0aba6994fd12228e957a5b576b93
|
File details
Details for the file autobidsify-0.5.0-py3-none-any.whl.
File metadata
- Download URL: autobidsify-0.5.0-py3-none-any.whl
- Upload date:
- Size: 76.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9e4d2d84dcd6be73c386dc725c3670cd894ad4e5cb7924cf48fa4ae8f8a37b9
|
|
| MD5 |
44cc9e64614ebfeec1d155998486ede7
|
|
| BLAKE2b-256 |
fda39aa52a8845e30d90f5a5d61da6f63e8c2e5ece397f07531ba4d599498314
|