Skip to main content

BiDoRA/LoRA fine-tuning toolkit for 3D code generation and spatial intelligence

Project description

BiDoRA: Bi-Level Optimization for Parameter-Efficient Fine-Tuning

BiDoRA is a Python package implementing true BiDoRA (Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation) for efficient fine-tuning of Large Language Models. Specifically optimized for:

  • 3D Code Generation (Rust, Blender, CAD)
  • Spatial Intelligence Tasks
  • Small Datasets (<10k samples)
  • Automatic Hardware Adaptation (Laptop to A100)

🔬 What is BiDoRA?

BiDoRA uses bi-level optimization to separately optimize magnitude and direction components of weight updates:

W' = m ⊙ (W₀ + BA) / ||W₀ + BA||
     ↑      ↑
  magnitude direction
  (upper)   (lower)

Training Process:

  1. Lower Level: Optimize direction (A, B matrices) on training set
  2. Upper Level: Optimize magnitude (m) on validation set via hypergradients
  3. Final Phase: Direction fine-tuning on combined data with fixed magnitude

Benefits:

  • ✅ Reduces overfitting on small datasets (<10k samples)
  • ✅ Better alignment with full fine-tuning (correlation: -8.042 vs -1.784 for DoRA)
  • ✅ Statistically significant improvements on GLUE (p < 0.001)

Important Notes:

  • ⚠️ Training Time: 3-4x slower than standard LoRA due to bi-level optimization
  • ⚠️ No Quantization: BiDoRA requires full precision (bfloat16) - quantization disabled automatically
  • ⚠️ Memory: Uses 8-bit AdamW optimizer (75% memory reduction) to compensate
  • Best For: Small specialized datasets where quality > speed

🚀 Features

  • BiDoRA Bi-Level Optimization: True magnitude-direction decomposition
  • Auto Hardware Detection: Automatically adapts config to available hardware
  • Full Precision Training: Optimized for bfloat16 (no quantization needed for BiDoRA)
  • Flexible Data Formats: JSONL, HuggingFace Datasets
  • Type-Safe Config: Pydantic-validated configuration
  • CLI Interface: Simple command-line interface with Typer

📦 Installation

From PyPI (recommended)

pip install bidora

As a project dependency

# With uv (recommended)
uv add bidora

# With pip
pip install bidora

From source (for development)

git clone https://github.com/bjoernbethge/bidora.git
cd bidora
uv sync --dev

🎯 Quick Start

1. Show hardware info

bidora info

Shows available hardware and recommended configuration.

2. Show recommended models

bidora list-models

3. Start BiDoRA training

Important: BiDoRA requires separate train and validation files for bi-level optimization.

Basic training

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-4B \
  --output ./output \
  --rank 8 \
  --epochs 3

With custom learning rates

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-4B \
  --lr 2e-4 \
  --upper-lr-mult 2.0 \
  --rank 8

With HuggingFace dataset

bidora train \
  --dataset "code_search_net" \
  --model Qwen/Qwen3-8B \
  --output ./output \
  --rank 8

📊 Data Format

JSONL Format (Instruction-Tuning)

{"instruction": "Generate a Rust function to create a 3D cube mesh", "output": "fn create_cube() -> Mesh { ... }"}
{"instruction": "Write Blender Python code to add a sphere", "input": "radius: 2.0", "output": "import bpy\nbpy.ops.mesh.primitive_uv_sphere_add(radius=2.0)"}

JSONL Format (Code Completion)

{"prompt": "// Generate 3D mesh\nfn create_mesh()", "completion": " -> Mesh {\n    let vertices = vec![...];\n    Mesh::new(vertices)\n}"}

JSONL Format (Code-Only)

{"code": "use bevy::prelude::*;\n\nfn setup_3d_scene(mut commands: Commands) { ... }"}

⚙️ Hardware-Specific Setups

Laptop (8GB GPU)

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-4B \
  --rank 4 \
  --batch-size 1 \
  --auto-hardware  # Automatic adaptation

Config automatically adjusted:

  • Precision: bfloat16 (full precision - BiDoRA requirement)
  • Batch Size: 1-2
  • Gradient Accumulation: 8-16
  • Max Seq Length: 1024-2048

Desktop (16GB GPU)

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-8B \
  --rank 16 \
  --batch-size 2 \
  --auto-hardware

Auto-Config:

  • Precision: bfloat16 (full precision - BiDoRA requirement)
  • Batch Size: 2-4
  • Gradient Accumulation: 4-8
  • Max Seq Length: 2048

A100 (40GB)

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-32B \
  --rank 16 \
  --batch-size 8 \
  --auto-hardware

Auto-Config:

  • Precision: bfloat16 (full precision - BiDoRA requirement)
  • Batch Size: 4-8
  • Gradient Accumulation: 2-4
  • Max Seq Length: 4096

🎛️ Advanced Options

All CLI Parameters

bidora train --help

Most Important Parameters:

Parameter Description Default
--model, -m Model name or path Qwen/Qwen3-4B
--train-file, -t Training JSONL Required
--val-file, -v Validation JSONL Required for BiDoRA
--dataset, -d HuggingFace Dataset -
--output, -o Output directory ./output
--rank, -r LoRA Rank 8
--epochs, -e Training Epochs 3
--batch-size, -b Batch Size 4
--lr Learning Rate (lower level) 2e-4
--upper-lr-mult Upper level LR multiplier 2.0
--max-samples Max Training Samples All
--auto-hardware Auto-adjustment True

Manual Config (without Auto-Hardware)

bidora train \
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --model Qwen/Qwen3-8B \
  --rank 16 \
  --batch-size 8 \
  --lr 3e-4 \
  --epochs 5 \
  --no-auto-hardware  # Manual config

💾 Memory Requirements

Qwen3 Model Sizes (BiDoRA - Full Precision)

⚠️ Note: BiDoRA requires full precision (bfloat16) - no quantization. Memory requirements higher than standard LoRA.

Model Parameter VRAM (bf16) Training VRAM Recommended For
Qwen3-0.6B 0.6B ~2GB ~6GB Laptop GPU (6-8GB)
Qwen3-1.7B 1.7B ~4GB ~10GB Laptop GPU (8GB+)
Qwen3-4B 4B ~8GB ~16GB Desktop GPU (12-16GB)
Qwen3-8B 8B ~16GB ~24GB Desktop GPU (24GB+) / A100
Qwen3-14B 14B ~28GB ~40GB A100 (40GB)
Qwen3-32B 32B ~64GB ~80GB A100 (80GB)

💡 Memory Optimization: Uses 8-bit AdamW optimizer (75% memory reduction) to compensate for full precision requirement.

Trainable Parameters (LoRA Rank=8)

Base Model LoRA Params Reduction
7B ~2M 3500×
14B ~4M 3500×
32B ~8M 4000×

🧪 Example Workflow: 3D Rust Code Fine-Tuning

1. Prepare data

# data/rust_3d_train.jsonl
{"instruction": "Create a three-rs mesh for a cube", "output": "use three::*;\n\nfn create_cube(size: f32) -> Mesh {\n    let geometry = Geometry::cuboid(size, size, size);\n    Mesh::new(geometry, Material::default())\n}"}
{"instruction": "Generate Bevy 3D scene setup", "output": "use bevy::prelude::*;\n\nfn setup(mut commands: Commands) {\n    commands.spawn(Camera3dBundle::default());\n    commands.spawn(PbrBundle {\n        mesh: meshes.add(Mesh::from(shape::Cube { size: 1.0 })),\n        ..default()\n    });\n}"}

2. Start training

bidora train \
  --train-file data/rust_3d_train.jsonl \
  --val-file data/rust_3d_val.jsonl \
  --model Qwen/Qwen3-4B \
  --output ./rust_3d_model \
  --rank 8 \
  --epochs 3 \
  --batch-size 2

3. Use model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model with BiDoRA adapters
model = AutoModelForCausalLM.from_pretrained(
    "./rust_3d_model/final_model",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B")

# Generate
prompt = "### Instruction:\nCreate a three-rs function to render a sphere\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Programmatic Usage

from bidora import (
    FullConfig, ModelConfig, BiDoRAConfig, TrainingConfig, DataConfig,
    load_model_and_tokenizer, prepare_bidora_model,
    load_and_prepare_dataset, prepare_dataset_for_training,
    train_bidora
)
from pathlib import Path

# Create config
config = FullConfig(
    model=ModelConfig(
        model_name="Qwen/Qwen3-4B",
        quantization="none"  # BiDoRA requires full precision (bfloat16)
    ),
    bidora=BiDoRAConfig(
        rank=8,
        use_bidora=True,  # Enable BiDoRA bi-level optimization
        upper_lr_multiplier=2.0
    ),
    training=TrainingConfig(
        batch_size=2,
        learning_rate=2e-4,
        num_epochs=3
    ),
    data=DataConfig(
        train_file=Path("data/train.jsonl"),
        val_file=Path("data/val.jsonl")  # Required for BiDoRA
    ),
    output_dir=Path("./output")
)

# Auto-adjust for hardware (will keep full precision for BiDoRA)
config.auto_adjust_for_hardware()

# Load model with BiDoRA layers
model, tokenizer = load_model_and_tokenizer(config.model)
model = prepare_bidora_model(model, config.bidora, quantized=False)

# Load data
dataset = load_and_prepare_dataset(config.data)
tokenized_dataset = prepare_dataset_for_training(
    dataset, tokenizer, config.training.max_seq_length
)

# Train with bi-level optimization
trainer = train_bidora(model, tokenizer, tokenized_dataset, config)

🐛 Troubleshooting

CUDA Out of Memory

# Reduce batch size
bidora train --batch-size 1 ...

# Or use smaller model
bidora train --model Qwen/Qwen3-1.7B ...

# Note: BiDoRA cannot use quantization (requires full precision)

Flash Attention Error

If Flash Attention 2 is not available:

  • Automatically disabled
  • Or manually: Set use_flash_attention=False in ModelConfig

Import Errors

# Reinstall dependencies
uv pip install --force-reinstall transformers accelerate peft bitsandbytes

📚 Further Resources

📖 Citation

If you use BiDoRA in your research, please cite:

@article{liu2024bidora,
  title={BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation},
  author={Liu, Peiran and Wang, Luning and Sun, Yanchao and Tang, Zhongwei and Xu, Dawei and Li, Jiaxi and Xu, Zhili},
  journal={arXiv preprint arXiv:2410.09758},
  year={2024}
}

📝 License

MIT License - see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bidora-0.1.1.tar.gz (205.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bidora-0.1.1-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file bidora-0.1.1.tar.gz.

File metadata

  • Download URL: bidora-0.1.1.tar.gz
  • Upload date:
  • Size: 205.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for bidora-0.1.1.tar.gz
Algorithm Hash digest
SHA256 148fdfb498b589d98185d36a2fd266a2ea5bec21e1b90723aa71a7154815ea1f
MD5 aee1bad7aa7f5e4c35f216bf95b645b0
BLAKE2b-256 5b068c1204055613e415ad5c683cdbac662c617ecd84361d7da39b6a3884d939

See more details on using hashes here.

File details

Details for the file bidora-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: bidora-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for bidora-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 01d3800576a400b44edd7c82e08be4b8551dfa6fc15f579c068662f1cc90811e
MD5 ec21bf55c3287911d0a646b92ea60b72
BLAKE2b-256 2bf54e9baebedab7bc1e6f86a547b1803797b4f1410a61282922f3c682685128

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page