Skip to main content

A no-thrills Python package for finetuning Vision-Language Models (VLMs)

Project description

bft

No thrills. Un-Optimized. Training.


Brute Force Training

A no-thrills, unoptimized Python package for finetuning Vision-Language Models (VLMs). This package provides simple training utilities for various VLM architectures with HuggingFace datasets integration.

Supported Models

  • Qwen2-VL: Vision-language models from the Qwen2-VL series
  • Qwen2.5-VL: Enhanced vision-language models with improved capabilities
  • LFM2-VL: Liquid AI's vision-language models
  • Qwen3: Text-only models from the Qwen3 series

Features

  • ๐Ÿš€ Simple, unoptimized training loops - perfect for research and experimentation
  • ๐Ÿ“Š HuggingFace datasets integration out of the box
  • ๐Ÿ”ง Configurable data filtering and preprocessing
  • ๐Ÿ’พ Automatic model checkpointing during training
  • ๐ŸŽฏ Built-in validation loops
  • ๐Ÿ“ธ Automatic image preprocessing and resizing
  • ๐Ÿ—๏ธ Modular architecture with base classes for easy extension
  • ๐Ÿ“ˆ Comprehensive documentation generation - README.md for each checkpoint
  • ๐ŸŽจ Training visualizations - Loss curves and evaluation charts
  • ๐Ÿ“‹ HuggingFace model cards - Automatic metadata generation
  • ๐Ÿ” Pre/post training evaluation - Compare model performance
  • ๐Ÿ“Š Training metrics tracking - Detailed training history

Installation

From PyPI (when published)

pip install brute-force-training

From Source

git clone https://github.com/wjbmattingly/brute-force-training.git
cd brute-force-training
pip install -e .

Requirements

  • Python 3.8+
  • PyTorch 1.11.0+
  • transformers 4.37.0+
  • datasets 2.14.0+

Quick Start

Vision-Language Model Training (Qwen2-VL)

from brute_force_training import Qwen2VLTrainer

# Initialize trainer
trainer = Qwen2VLTrainer(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    output_dir="./my_finetuned_model"
)

# Train the model
trainer.train_and_validate(
    dataset_name="your_dataset_name",
    image_column="image",
    text_column="text", 
    user_text="Describe this image",
    max_steps=1000,
    train_batch_size=2,
    learning_rate=1e-5,
    validate_before=True,    # Pre-training evaluation
    generate_docs=True       # Generate documentation
)

Text-Only Model Training (Qwen3)

from brute_force_training import Qwen3Trainer

# Initialize trainer
trainer = Qwen3Trainer(
    model_name="Qwen/Qwen3-4B-Thinking-2507",
    output_dir="./my_finetuned_qwen3"
)

# Train the model
trainer.train_and_validate(
    dataset_name="your_text_dataset",
    input_column="input",
    output_column="output",
    system_prompt="You are a helpful assistant.",  # โœจ System prompt support
    max_steps=1000,
    train_batch_size=4,
    learning_rate=1e-5
)

System Prompts for Text Models

Just like vision models have user_text, text models now support system_prompt:

# Math tutoring model
trainer.train_and_validate(
    dataset_name="math_problems",
    system_prompt="You are a mathematics tutor. Provide step-by-step solutions."
)

# Code assistant model  
trainer.train_and_validate(
    dataset_name="code_questions",
    system_prompt="You are a coding assistant. Write clean, efficient code."
)

# Creative writing model
trainer.train_and_validate(
    dataset_name="writing_prompts", 
    system_prompt="You are a creative writer. Write engaging stories."
)

# No system prompt (original behavior)
trainer.train_and_validate(
    dataset_name="general_qa",
    system_prompt=None  # Or just omit this parameter
)

Documentation & Visualization Features

Automatic Documentation Generation

Every checkpoint now includes comprehensive documentation:

trainer.train_and_validate(
    dataset_name="your_dataset",
    # ... other parameters ...
    validate_before=True,    # Run evaluation before training starts
    generate_docs=True       # Generate docs and visualizations
)

Each saved checkpoint will contain:

  • README.md - Detailed model card with training info
  • training_curves.png - Loss and learning rate visualizations
  • evaluation_comparison.png - Before/after training performance
  • training_metrics.json - Complete training history
  • model_card_metadata.json - HuggingFace metadata

Pre/Post Training Evaluation

Compare your model's performance before and after training:

# This will automatically run if validate_before=True
# Shows output like:
# ๐Ÿ” Running pre-training evaluation...
# ๐Ÿ“Š Pre-training - Loss: 2.456789, Perplexity: 11.67
# 
# [training happens]
#
# ๐Ÿ” Running post-training evaluation...  
# ๐Ÿ“Š Post-training - Loss: 1.234567, Perplexity: 3.44
# ๐ŸŽฏ Loss improvement: +49.75% (from 2.456789 to 1.234567)

Training Visualizations

Automatic generation of:

  • Loss curves showing training and validation loss over time
  • Learning rate schedules
  • Evaluation comparisons with before/after metrics
  • Training progress with step-by-step metrics

Advanced Usage

Custom Data Filtering

def my_filter_function(example):
    # Only include examples with text length between 50-1000 characters
    return 50 <= len(example['text']) <= 1000

trainer = Qwen2VLTrainer(
    model_name="Qwen/Qwen2-VL-2B-Instruct",
    output_dir="./filtered_model"
)

# Override the default filtering
trainer.filter_dataset = lambda dataset: dataset.filter(my_filter_function)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image",
    text_column="text"
)

Training Configuration

trainer.train_and_validate(
    dataset_name="CATMuS/medieval",
    image_column="im",
    text_column="text",
    user_text="Transcribe this medieval manuscript line",
    
    # Training parameters
    max_steps=10000,
    eval_steps=500,
    num_accumulation_steps=4,
    learning_rate=1e-5,
    
    # Data selection
    train_select_start=0,
    train_select_end=5000,
    val_select_start=5000,
    val_select_end=6000,
    
    # Batch sizes
    train_batch_size=2,
    val_batch_size=2,
    
    # Image preprocessing
    max_image_size=500
)

Model-Specific Examples

LFM2-VL Training

from brute_force_training import LFM2VLTrainer

trainer = LFM2VLTrainer(
    model_name="LiquidAI/LFM2-VL-450M",
    output_dir="./lfm2_finetuned"
)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image",
    text_column="caption",
    user_text="What is in this image?",
    max_steps=5000,
    train_batch_size=1,  # LFM2-VL typically needs smaller batch sizes
    learning_rate=1e-5
)

Qwen2.5-VL Training

from brute_force_training import Qwen25VLTrainer

trainer = Qwen25VLTrainer(
    model_name="Qwen/Qwen2.5-VL-3B-Instruct",
    output_dir="./qwen25_finetuned",
    min_pixel=256,
    max_pixel=384,
    image_factor=28
)

trainer.train_and_validate(
    dataset_name="your_dataset",
    image_column="image", 
    text_column="text",
    max_steps=8000,
    eval_steps=1000
)

Dataset Format

Vision-Language Datasets

Your HuggingFace dataset should have:

  • An image column (PIL Images or base64 strings)
  • A text column (string descriptions/captions)

Text-Only Datasets

Your HuggingFace dataset should have:

  • An input column (input text)
  • An output column (target text)

Project Structure

brute_force_training/
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ datasets/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ vision_language.py    # VisionLanguageDataset class
โ”‚   โ””โ”€โ”€ text_only.py         # TextOnlyDataset class
โ”œโ”€โ”€ trainers/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ base.py              # BaseTrainer abstract class
โ”‚   โ”œโ”€โ”€ qwen2_vl.py          # Qwen2VLTrainer
โ”‚   โ”œโ”€โ”€ qwen25_vl.py         # Qwen25VLTrainer
โ”‚   โ”œโ”€โ”€ lfm2_vl.py           # LFM2VLTrainer
โ”‚   โ””โ”€โ”€ qwen3.py             # Qwen3Trainer
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ image_utils.py       # Image preprocessing utilities
    โ””โ”€โ”€ tokenization.py     # Tokenization utilities

Contributing

This is a research-focused package intended for experimentation. Contributions are welcome! Please feel free to:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

The original training scripts were adapted from zhangfaen/finetune-Qwen2-VL. We are deeply grateful for their foundational work.

Limitations

This package is intentionally "brute force" and unoptimized. It's designed for:

  • Research and experimentation
  • Quick prototyping
  • Educational purposes

For production use cases, consider more optimized training frameworks.

Support

For questions, issues, or feature requests, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brute_force_training-0.0.2.tar.gz (65.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

brute_force_training-0.0.2-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file brute_force_training-0.0.2.tar.gz.

File metadata

  • Download URL: brute_force_training-0.0.2.tar.gz
  • Upload date:
  • Size: 65.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for brute_force_training-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4751e2b31d500f8222ccfc2591027278b39f544ff29f987ac43e249c1b557357
MD5 021b831736aba50e0841201d1497f4b4
BLAKE2b-256 1ac84456294d9f05edd0d227c57e8f8e7b181f1725df990f854455c0fa21e980

See more details on using hashes here.

File details

Details for the file brute_force_training-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for brute_force_training-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 22f036bb9ce6ee3742219dbc6d1f205fe4a4206bc9218eb304a6850c7ea4bc47
MD5 1ca23150f4f49363a99be7e3f1248ef4
BLAKE2b-256 1bd094a528d6e01e7ad5d169926635ca3ae2a6a3d43f60b5af9942572c14cfe6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page