small vlm for training and experiments

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

leo1oel

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
Typing
- Typed

Project description

small-vlm

Architecture

A flexible and configurable Vision Language Model (VLM) framework built with PyTorch, designed for experimentation and ease of use. This framework allows for modular replacement of core components and fine-grained control over training parameters.

Features

Modular Design: Easily swap out the Language Model (LLM), Visual Encoder, and Connector components to experiment with different architectures.
Configuration Management: Utilizes Hydra for robust and flexible configuration management, allowing you to define and override parameters easily.
Environment Setup: Uses uv for fast and reliable Python environment and package management.
Granular Training Control:
- Independently set learning rates and weight decay for the LLM, visual encoder, and connector.
- Independently freeze or unfreeze these components during different training stages.
LLaVA Implementation: Includes a straightforward reproduction of the LLaVA model (pretraining and finetuning).
Hugging Face Hub Integration:
- Easily push your trained models to the Hugging Face Hub using a simple script.
- Load models pushed to the Hub using the standard AutoModel and AutoProcessor classes from the transformers library.

Architecture

The VLM consists of three main components:

Visual Encoder: Extracts visual features from images. Supports various vision transformers (e.g., CLIP). Configurable via model.visual_encoder in Hydra configs.
Language Model: Processes text and generates responses. Supports various Hugging Face language models. Configurable via model.language_model in Hydra configs.
Connector: Bridges the visual and language modalities. Supports different projection mechanisms (e.g., MLP). Configurable via model.connector in Hydra configs.

Setup and Installation

Environment Setup with uv: This project uses uv for Python environment and dependency management. For instructions on installing uv and setting up Python, please refer to installation.md.
Install Dependencies: Once uv is installed and you have cloned the repository, install the necessary dependencies:
```
make install
```

Training

Training is managed via Hydra configurations and executed using DeepSpeed.

LLaVA Pretraining

To pretrain the LLaVA model, run:

deepspeed --module vlm -cn pretrain-llava

Customization

You can customize various aspects of the model and training process through Hydra configurations located in src/vlm/config/. This includes:

Model Components:
- model.visual_encoder.hf_name: Hugging Face name of the visual encoder.
- model.language_model.hf_name: Hugging Face name of the language model.
- model.connector.name and model.connector.type: Define the type and specifics of the connector module.
Training Parameters per Component:
- trainer.unfreeze: Booleans train_vision_model, train_language_model, train_connector to control which parts are trainable.
- trainer.learning_rate: Specific learning rates like visual_encoder_learning_rate, language_model_learning_rate, connector_learning_rate.
- trainer.weight_decay: Specific weight decays like visual_encoder_weight_decay, language_model_weight_decay, connector_weight_decay.

For example, to change the learning rate for the language model during finetuning, you could modify src/vlm/config/trainer/learning_rate/llava-finetune.yaml or override it via the command line:

deepspeed --module vlm -cn finetune-llava trainer.learning_rate.language_model_learning_rate=5e-6

Inference

You can refer to src/vlm/inference/eval.py

LLaVA Reproduction Results (Using lmms-eval)

Task	Metric	Reproduced LLaVA (Value ± Stderr)	Original LLaVA (Value ± Stderr)
gqa	exact_match	0.6201 ± 0.0043	0.6192 ± 0.0043
mmbench_cn_cc	gpt_eval_score	25.2941 ± N/A	23.5294 ± N/A
mmbench_cn_dev	gpt_eval_score	54.8969 ± N/A	55.6701 ± N/A
mmbench_en_dev	gpt_eval_score	66.0653 ± N/A	64.0893 ± N/A
mmbench_ru_dev	gpt_eval_score	54.9282 ± N/A	53.0144 ± N/A
mme	mme_cognition_score	321.4286 ± N/A	355.7143 ± N/A
mme	mme_perception_score	1505.4650 ± N/A	1509.1289 ± N/A
scienceqa	exact_match	0.6977 ± 0.0071	0.6572 ± 0.0073
seedbench	seed_image	0.6593 ± N/A	0.6616 ± N/A
textvqa_val	exact_match	0.4902 ± 0.0068	0.4600 ± 0.0068
mmmu_val	mmmu_acc	0.3789 ± N/A	0.3611 ± N/A
ai2d	exact_match	0.5379 ± 0.009	0.5518 ± 0.009

Pushing Models to Hugging Face Hub

This project provides a script to easily upload your trained models and processors to the Hugging Face Hub.

Run the push script: Execute the push-to-hub command (which calls the push_vlm_to_hub function):
```
push-to-hub
```
The script will interactively ask for:
- Path to your pretrained/finetuned model checkpoint directory.
- The desired repository name on the Hub (e.g., your-username/your-model-name).
- Whether to force push if the repository already exists.
Loading from Hub: Once pushed, your model can be loaded by anyone using the standard transformers library:
```
from transformers import AutoModel, AutoProcessor

repo_id = "your-username/your-model-name"
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)

# ... proceed with inference
```
The push-to-hub script automatically prepares the necessary configuration files (modeling_vlm.py, processing_vlm.py, configuration_vlm.py, connectors.py) and updates config.json and processor_config.json to enable this seamless loading.

This project was built from simple-modern-uv, LLaVA, LLaVA-NEXT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

leo1oel

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

This version

0.9.2

May 22, 2025

0.9.1

May 16, 2025

0.9.0

May 16, 2025

0.8.3

May 16, 2025

0.8.2

May 14, 2025

0.8.1

May 13, 2025

0.8.0

May 12, 2025

0.7.1

May 10, 2025

0.7.0

May 8, 2025

0.6.0

May 7, 2025

0.5.2

Apr 15, 2025

0.5.1

Apr 15, 2025

0.5.0

Apr 12, 2025

0.4.0

Apr 2, 2025

0.3.0

Mar 31, 2025

0.2.0

Mar 26, 2025

0.1.1

Mar 24, 2025

0.1.0

Mar 24, 2025

0.0.0

Mar 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

small_vlm-0.9.2.tar.gz (374.2 kB view details)

Uploaded May 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

small_vlm-0.9.2-py3-none-any.whl (66.5 kB view details)

Uploaded May 22, 2025 Python 3

File details

Details for the file small_vlm-0.9.2.tar.gz.

File metadata

Download URL: small_vlm-0.9.2.tar.gz
Upload date: May 22, 2025
Size: 374.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for small_vlm-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`e1b6d4e50b3c17c7faf169a131f65fb59f123faf94aa4203ba3b632e061a0338`
MD5	`d8680b3a4edbe9f89528806eedecf963`
BLAKE2b-256	`8274f6d0a8becb1b67429c8c2e570e1cc55bd237a425d5757f79f088c664a43e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for small_vlm-0.9.2.tar.gz:

Publisher: publish.yml on leo1oel/small-vlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: small_vlm-0.9.2.tar.gz
- Subject digest: e1b6d4e50b3c17c7faf169a131f65fb59f123faf94aa4203ba3b632e061a0338
- Sigstore transparency entry: 217678031
- Sigstore integration time: May 22, 2025
Source repository:
- Permalink: leo1oel/small-vlm@1a63a30ac73ad60417c1bada106255f0459e6c5d
- Branch / Tag: refs/tags/v0.9.2
- Owner: https://github.com/leo1oel
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a63a30ac73ad60417c1bada106255f0459e6c5d
- Trigger Event: release

File details

Details for the file small_vlm-0.9.2-py3-none-any.whl.

File metadata

Download URL: small_vlm-0.9.2-py3-none-any.whl
Upload date: May 22, 2025
Size: 66.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for small_vlm-0.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1bf7318db7f270836f480c90ef2f606047e86689cb90b898a7f7c14c572df852`
MD5	`359150df860e465306c98606ade6c439`
BLAKE2b-256	`24b67c1c1ad91cbd22328464c37d08f985ab04b8e367958af6f45cc85412981a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for small_vlm-0.9.2-py3-none-any.whl:

Publisher: publish.yml on leo1oel/small-vlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: small_vlm-0.9.2-py3-none-any.whl
- Subject digest: 1bf7318db7f270836f480c90ef2f606047e86689cb90b898a7f7c14c572df852
- Sigstore transparency entry: 217678037
- Sigstore integration time: May 22, 2025
Source repository:
- Permalink: leo1oel/small-vlm@1a63a30ac73ad60417c1bada106255f0459e6c5d
- Branch / Tag: refs/tags/v0.9.2
- Owner: https://github.com/leo1oel
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1a63a30ac73ad60417c1bada106255f0459e6c5d
- Trigger Event: release

small-vlm 0.9.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

small-vlm

Features

Architecture

Setup and Installation

Training

LLaVA Pretraining

Customization

Inference

LLaVA Reproduction Results (Using lmms-eval)

Pushing Models to Hugging Face Hub

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance