A no-thrills Python package for finetuning Vision-Language Models (VLMs)
Project description
No thrills. Un-Optimized. Training.
Brute Force Training
A no-thrills, unoptimized Python package for finetuning Vision-Language Models (VLMs). This package provides simple training utilities for various VLM architectures with HuggingFace datasets integration.
Supported Models
- Qwen2-VL: Vision-language models from the Qwen2-VL series
- Qwen2.5-VL: Enhanced vision-language models with improved capabilities
- LFM2-VL: Liquid AI's vision-language models
- Qwen3: Text-only models from the Qwen3 series
Features
- ๐ Simple, unoptimized training loops - perfect for research and experimentation
- ๐ HuggingFace datasets integration out of the box
- ๐ง Configurable data filtering and preprocessing
- ๐พ Automatic model checkpointing during training
- ๐ฏ Built-in validation loops
- ๐ธ Automatic image preprocessing and resizing
- ๐๏ธ Modular architecture with base classes for easy extension
- ๐ Comprehensive documentation generation - README.md for each checkpoint
- ๐จ Training visualizations - Loss curves and evaluation charts
- ๐ HuggingFace model cards - Automatic metadata generation
- ๐ Pre/post training evaluation - Compare model performance
- ๐ Training metrics tracking - Detailed training history
Installation
From PyPI (when published)
pip install brute-force-training
From Source
git clone https://github.com/wjbmattingly/brute-force-training.git
cd brute-force-training
pip install -e .
Requirements
- Python 3.8+
- PyTorch 1.11.0+
- transformers 4.37.0+
- datasets 2.14.0+
Quick Start
Vision-Language Model Training (Qwen2-VL)
from brute_force_training import Qwen2VLTrainer
# Initialize trainer
trainer = Qwen2VLTrainer(
model_name="Qwen/Qwen2-VL-2B-Instruct",
output_dir="./my_finetuned_model"
)
# Train the model
trainer.train_and_validate(
dataset_name="your_dataset_name",
image_column="image",
text_column="text",
user_text="Describe this image",
max_steps=1000,
train_batch_size=2,
learning_rate=1e-5,
validate_before=True, # Pre-training evaluation
generate_docs=True # Generate documentation
)
Text-Only Model Training (Qwen3)
from brute_force_training import Qwen3Trainer
# Initialize trainer
trainer = Qwen3Trainer(
model_name="Qwen/Qwen3-4B-Thinking-2507",
output_dir="./my_finetuned_qwen3"
)
# Train the model
trainer.train_and_validate(
dataset_name="your_text_dataset",
input_column="input",
output_column="output",
system_prompt="You are a helpful assistant.", # โจ System prompt support
max_steps=1000,
train_batch_size=4,
learning_rate=1e-5
)
System Prompts for Text Models
Just like vision models have user_text, text models now support system_prompt:
# Math tutoring model
trainer.train_and_validate(
dataset_name="math_problems",
system_prompt="You are a mathematics tutor. Provide step-by-step solutions."
)
# Code assistant model
trainer.train_and_validate(
dataset_name="code_questions",
system_prompt="You are a coding assistant. Write clean, efficient code."
)
# Creative writing model
trainer.train_and_validate(
dataset_name="writing_prompts",
system_prompt="You are a creative writer. Write engaging stories."
)
# No system prompt (original behavior)
trainer.train_and_validate(
dataset_name="general_qa",
system_prompt=None # Or just omit this parameter
)
Documentation & Visualization Features
Automatic Documentation Generation
Every checkpoint now includes comprehensive documentation:
trainer.train_and_validate(
dataset_name="your_dataset",
# ... other parameters ...
validate_before=True, # Run evaluation before training starts
generate_docs=True # Generate docs and visualizations
)
Each saved checkpoint will contain:
- README.md - Detailed model card with training info
- training_curves.png - Loss and learning rate visualizations
- evaluation_comparison.png - Before/after training performance
- training_metrics.json - Complete training history
- model_card_metadata.json - HuggingFace metadata
Pre/Post Training Evaluation
Compare your model's performance before and after training:
# This will automatically run if validate_before=True
# Shows output like:
# ๐ Running pre-training evaluation...
# ๐ Pre-training - Loss: 2.456789, Perplexity: 11.67
#
# [training happens]
#
# ๐ Running post-training evaluation...
# ๐ Post-training - Loss: 1.234567, Perplexity: 3.44
# ๐ฏ Loss improvement: +49.75% (from 2.456789 to 1.234567)
Training Visualizations
Automatic generation of:
- Loss curves showing training and validation loss over time
- Learning rate schedules
- Evaluation comparisons with before/after metrics
- Training progress with step-by-step metrics
Advanced Usage
Custom Data Filtering
def my_filter_function(example):
# Only include examples with text length between 50-1000 characters
return 50 <= len(example['text']) <= 1000
trainer = Qwen2VLTrainer(
model_name="Qwen/Qwen2-VL-2B-Instruct",
output_dir="./filtered_model"
)
# Override the default filtering
trainer.filter_dataset = lambda dataset: dataset.filter(my_filter_function)
trainer.train_and_validate(
dataset_name="your_dataset",
image_column="image",
text_column="text"
)
Training Configuration
trainer.train_and_validate(
dataset_name="CATMuS/medieval",
image_column="im",
text_column="text",
user_text="Transcribe this medieval manuscript line",
# Training parameters
max_steps=10000,
eval_steps=500,
num_accumulation_steps=4,
learning_rate=1e-5,
# Data selection
train_select_start=0,
train_select_end=5000,
val_select_start=5000,
val_select_end=6000,
# Batch sizes
train_batch_size=2,
val_batch_size=2,
# Image preprocessing
max_image_size=500
)
Model-Specific Examples
LFM2-VL Training
from brute_force_training import LFM2VLTrainer
trainer = LFM2VLTrainer(
model_name="LiquidAI/LFM2-VL-450M",
output_dir="./lfm2_finetuned"
)
trainer.train_and_validate(
dataset_name="your_dataset",
image_column="image",
text_column="caption",
user_text="What is in this image?",
max_steps=5000,
train_batch_size=1, # LFM2-VL typically needs smaller batch sizes
learning_rate=1e-5
)
Qwen2.5-VL Training
from brute_force_training import Qwen25VLTrainer
trainer = Qwen25VLTrainer(
model_name="Qwen/Qwen2.5-VL-3B-Instruct",
output_dir="./qwen25_finetuned",
min_pixel=256,
max_pixel=384,
image_factor=28
)
trainer.train_and_validate(
dataset_name="your_dataset",
image_column="image",
text_column="text",
max_steps=8000,
eval_steps=1000
)
Dataset Format
Vision-Language Datasets
Your HuggingFace dataset should have:
- An image column (PIL Images or base64 strings)
- A text column (string descriptions/captions)
Text-Only Datasets
Your HuggingFace dataset should have:
- An input column (input text)
- An output column (target text)
Project Structure
brute_force_training/
โโโ __init__.py
โโโ datasets/
โ โโโ __init__.py
โ โโโ vision_language.py # VisionLanguageDataset class
โ โโโ text_only.py # TextOnlyDataset class
โโโ trainers/
โ โโโ __init__.py
โ โโโ base.py # BaseTrainer abstract class
โ โโโ qwen2_vl.py # Qwen2VLTrainer
โ โโโ qwen25_vl.py # Qwen25VLTrainer
โ โโโ lfm2_vl.py # LFM2VLTrainer
โ โโโ qwen3.py # Qwen3Trainer
โโโ utils/
โโโ __init__.py
โโโ image_utils.py # Image preprocessing utilities
โโโ tokenization.py # Tokenization utilities
Contributing
This is a research-focused package intended for experimentation. Contributions are welcome! Please feel free to:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
MIT License - see LICENSE file for details.
Acknowledgments
The original training scripts were adapted from zhangfaen/finetune-Qwen2-VL. We are deeply grateful for their foundational work.
Limitations
This package is intentionally "brute force" and unoptimized. It's designed for:
- Research and experimentation
- Quick prototyping
- Educational purposes
For production use cases, consider more optimized training frameworks.
Support
For questions, issues, or feature requests, please open an issue on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file brute_force_training-0.0.2.tar.gz.
File metadata
- Download URL: brute_force_training-0.0.2.tar.gz
- Upload date:
- Size: 65.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4751e2b31d500f8222ccfc2591027278b39f544ff29f987ac43e249c1b557357
|
|
| MD5 |
021b831736aba50e0841201d1497f4b4
|
|
| BLAKE2b-256 |
1ac84456294d9f05edd0d227c57e8f8e7b181f1725df990f854455c0fa21e980
|
File details
Details for the file brute_force_training-0.0.2-py3-none-any.whl.
File metadata
- Download URL: brute_force_training-0.0.2-py3-none-any.whl
- Upload date:
- Size: 34.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22f036bb9ce6ee3742219dbc6d1f205fe4a4206bc9218eb304a6850c7ea4bc47
|
|
| MD5 |
1ca23150f4f49363a99be7e3f1248ef4
|
|
| BLAKE2b-256 |
1bd094a528d6e01e7ad5d169926635ca3ae2a6a3d43f60b5af9942572c14cfe6
|