Skip to main content

A no-thrills Python package for finetuning Vision-Language Models (VLMs)

Project description

Brute Force Training

A no-thrills, unoptimized Python package for finetuning Vision-Language Models (VLMs). This package provides simple training utilities for various VLM architectures with HuggingFace datasets integration.

Supported Models

  • Qwen2-VL: Vision-language models from the Qwen2-VL series
  • Qwen2.5-VL: Enhanced vision-language models with improved capabilities
  • LFM2-VL: Liquid AI's vision-language models
  • Qwen3: Text-only models from the Qwen3 series

Features

  • ๐Ÿš€ Simple, unoptimized training loops - perfect for research and experimentation
  • ๐Ÿ“Š HuggingFace datasets integration out of the box
  • ๐Ÿ”ง Configurable data filtering and preprocessing
  • ๐Ÿ’พ Automatic model checkpointing during training
  • ๐ŸŽฏ Built-in validation loops
  • ๐Ÿ“ธ Automatic image preprocessing and resizing
  • ๐Ÿ—๏ธ Modular architecture with base classes for easy extension
  • ๐Ÿ“ˆ Comprehensive documentation generation - README.md for each checkpoint
  • ๐ŸŽจ Training visualizations - Loss curves and evaluation charts
  • ๐Ÿ“‹ HuggingFace model cards - Automatic metadata generation
  • ๐Ÿ” Pre/post training evaluation - Compare model performance
  • ๐Ÿ“Š Training metrics tracking - Detailed training history

Installation

From PyPI (when published)

pip install brute-force-training

From Source

git clone https://github.com/wjbmattingly/brute-force-training.git
cd brute-force-training
pip install -e .

Requirements

  • Python 3.8+
  • PyTorch 1.11.0+
  • transformers 4.37.0+
  • datasets 2.14.0+

Quick Start

Vision-Language Model Training (Qwen2-VL)

from brute_force_training import Qwen2VLTrainer

# Initialize trainer
trainer = Qwen2VLTrainer(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    output_dir="./my_finetuned_model"
)

# Train the model
trainer.train_and_validate(
    dataset_name="your_dataset_name",
    image_column="image",
    text_column="text", 
    user_text="Describe this image",
    max_steps=1000,
    train_batch_size=2,
    learning_rate=1e-5,
    validate_before=True,    # Pre-training evaluation
    generate_docs=True       # Generate documentation
)

Text-Only Model Training (Qwen3)

from brute_force_training import Qwen3Trainer

# Initialize trainer
trainer = Qwen3Trainer(
    model_name="Qwen/Qwen3-4B-Thinking-2507",
    output_dir="./my_finetuned_qwen3"
)

# Train the model
trainer.train_and_validate(
    dataset_name="your_text_dataset",
    input_column="input",
    output_column="output",
    max_steps=1000,
    train_batch_size=4,
    learning_rate=1e-5
)

Documentation & Visualization Features

Automatic Documentation Generation

Every checkpoint now includes comprehensive documentation:

trainer.train_and_validate(
    dataset_name="your_dataset",
    # ... other parameters ...
    validate_before=True,    # Run evaluation before training starts
    generate_docs=True       # Generate docs and visualizations
)

Each saved checkpoint will contain:

  • README.md - Detailed model card with training info
  • training_curves.png - Loss and learning rate visualizations
  • evaluation_comparison.png - Before/after training performance
  • training_metrics.json - Complete training history
  • model_card_metadata.json - HuggingFace metadata

Pre/Post Training Evaluation

Compare your model's performance before and after training:

# This will automatically run if validate_before=True
# Shows output like:
# ๐Ÿ” Running pre-training evaluation...
# ๐Ÿ“Š Pre-training - Loss: 2.456789, Perplexity: 11.67
# 
# [training happens]
#
# ๐Ÿ” Running post-training evaluation...  
# ๐Ÿ“Š Post-training - Loss: 1.234567, Perplexity: 3.44
# ๐ŸŽฏ Loss improvement: +49.75% (from 2.456789 to 1.234567)

Training Visualizations

Automatic generation of:

  • Loss curves showing training and validation loss over time
  • Learning rate schedules
  • Evaluation comparisons with before/after metrics
  • Training progress with step-by-step metrics

Advanced Usage

Custom Data Filtering

def my_filter_function(example):
    # Only include examples with text length between 50-1000 characters
    return 50 <= len(example['text']) <= 1000

trainer = Qwen2VLTrainer(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    output_dir="./filtered_model"
)

# Override the default filtering
trainer.filter_dataset = lambda dataset: dataset.filter(my_filter_function)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image",
    text_column="text"
)

Training Configuration

trainer.train_and_validate(
    dataset_name="CATMuS/medieval",
    image_column="im",
    text_column="text",
    user_text="Transcribe this medieval manuscript line",
    
    # Training parameters
    max_steps=10000,
    eval_steps=500,
    num_accumulation_steps=4,
    learning_rate=1e-5,
    
    # Data selection
    train_select_start=0,
    train_select_end=5000,
    val_select_start=5000,
    val_select_end=6000,
    
    # Batch sizes
    train_batch_size=2,
    val_batch_size=2,
    
    # Image preprocessing
    max_image_size=500
)

Model-Specific Examples

LFM2-VL Training

from brute_force_training import LFM2VLTrainer

trainer = LFM2VLTrainer(
    model_name="LiquidAI/LFM2-VL-450M",
    output_dir="./lfm2_finetuned"
)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image",
    text_column="caption",
    user_text="What is in this image?",
    max_steps=5000,
    train_batch_size=1,  # LFM2-VL typically needs smaller batch sizes
    learning_rate=1e-5
)

Qwen2.5-VL Training

from brute_force_training import Qwen25VLTrainer

trainer = Qwen25VLTrainer(
    model_name="Qwen/Qwen2.5-VL-3B-Instruct",
    output_dir="./qwen25_finetuned",
    min_pixel=256,
    max_pixel=384,
    image_factor=28
)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image", 
    text_column="text",
    max_steps=8000,
    eval_steps=1000
)

Dataset Format

Vision-Language Datasets

Your HuggingFace dataset should have:

  • An image column (PIL Images or base64 strings)
  • A text column (string descriptions/captions)

Text-Only Datasets

Your HuggingFace dataset should have:

  • An input column (input text)
  • An output column (target text)

Project Structure

brute_force_training/
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ datasets/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ vision_language.py    # VisionLanguageDataset class
โ”‚   โ””โ”€โ”€ text_only.py         # TextOnlyDataset class
โ”œโ”€โ”€ trainers/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ base.py              # BaseTrainer abstract class
โ”‚   โ”œโ”€โ”€ qwen2_vl.py          # Qwen2VLTrainer
โ”‚   โ”œโ”€โ”€ qwen25_vl.py         # Qwen25VLTrainer
โ”‚   โ”œโ”€โ”€ lfm2_vl.py           # LFM2VLTrainer
โ”‚   โ””โ”€โ”€ qwen3.py             # Qwen3Trainer
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ image_utils.py       # Image preprocessing utilities
    โ””โ”€โ”€ tokenization.py     # Tokenization utilities

Contributing

This is a research-focused package intended for experimentation. Contributions are welcome! Please feel free to:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

The original training scripts were adapted from zhangfaen/finetune-Qwen2-VL. We are deeply grateful for their foundational work.

Limitations

This package is intentionally "brute force" and unoptimized. It's designed for:

  • Research and experimentation
  • Quick prototyping
  • Educational purposes

For production use cases, consider more optimized training frameworks.

Support

For questions, issues, or feature requests, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brute_force_training-0.0.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

brute_force_training-0.0.1-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file brute_force_training-0.0.1.tar.gz.

File metadata

  • Download URL: brute_force_training-0.0.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for brute_force_training-0.0.1.tar.gz
Algorithm Hash digest
SHA256 3d8fe951547aeab5494693318e1fc70e408ca409b776e7fbd26925ee5cb42a4c
MD5 e15bf8922c56ec822a2f4f8b497169f5
BLAKE2b-256 1404cc20289f0bb8e4beaba4fbbfde97c800caec07b1db6b3298eb25278bec07

See more details on using hashes here.

File details

Details for the file brute_force_training-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for brute_force_training-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e21da4698c3e73ec650e5ab9736001aed06429dc98fbe038213ca49c3012a9e4
MD5 4f76745d935f0eb36a4a628e3c895695
BLAKE2b-256 dc964b10d64af3aa1b43a6b655c86de28b406087abee3ea06c40ad3663502095

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page