Efficient LoRA Fine-Tuning for Vision LLMs with advanced CLI and model zoo
Project description
Langvision: LoRA Fine-Tuning for Vision LLMs
What You'll Need
# Quick system check
python --version
# Check GPU support (Optional but recommended)
python -c "import torch; print('GPU ready!' if torch.cuda.is_available() else 'CPU mode - still works!')"
Install LangTrain
# Step 1: Create a clean environment (recommended)
python -m venv langtrain-env
source langtrain-env/bin/activate # Windows: langtrain-env\Scripts\activate
# Step 2: Install LangVision
pip install langvision
# Step 3: Verify it worked
python -c "import langvision; print('✅ LangVision installed!')"
Train Your First Model
from langvision import LoRATrainer
# Step 1: Define your training data (Images + QA)
training_data = [
{
"image": "./images/cat.jpg",
"question": "What is in this image?",
"answer": "A cute tabby cat sitting on a rug."
},
{
"image": "./images/dog.jpg",
"question": "Describe the animal.",
"answer": "A golden retriever playing with a ball."
}
]
# Step 2: Create the trainer
# Configures Vision Encoder + LLM Adapter automatically
trainer = LoRATrainer(
model_name="llava-v1.6-7b", # Works with LLaVA, Qwen-VL, BLIP-2 etc.
output_dir="./my_vision_model",
)
# Step 3: Train!
trainer.train(training_data)
# Step 4: Test your model
model = trainer.load_model()
response = model.chat("./images/cat.jpg", "What do you see?")
print(f"AI: {response}")
Use Your Trained Model
from langvision import ChatModel
# Load your trained model
model = ChatModel.load("./my_vision_model")
# Analyze images
print(model.chat("image1.jpg", "Describe this scene."))
Using Your Own Data
from langvision import LoRATrainer
trainer = LoRATrainer(
model_name="llava-v1.6-7b",
output_dir="./custom_vlm",
)
# Method 1: Load from Hugging Face datasets
trainer.train_from_hub("your_username/your_vqa_dataset")
Next Steps
- Train with QLoRA: Use
QLoRATrainerto fine-tune LLaVA-7B on consumer GPUs (under 12GB VRAM). - Explore Model Zoo:
langvision model-zoo listto see supported models (LLaVA, Qwen, CogVLM, etc.). - Read the Docs: Check out langtrain.xyz/docs.
Architecture Overview
Langvision adapts Vision Transformers (ViT) and Large Language Models (LLM) using LoRA.
flowchart TD
A(["Input Image"]) --> B(["Vision Encoder (Frozen)"])
B --> C(["Projector"])
C --> D(["LLM (LoRA Adapted)"])
D --> E(["Text Output"])
Contributing
Contributions are welcome! See CONTRIBUTING.md.
License
MIT License. See LICENSE.
Citation
@software{langvision2025,
author = {Pritesh Raj},
title = {Langvision: Efficient LoRA Fine-Tuning for Vision LLMs},
url = {https://github.com/langtrain-ai/langvision},
year = {2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
langvision-0.1.37.tar.gz
(119.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
langvision-0.1.37-py3-none-any.whl
(149.6 kB
view details)
File details
Details for the file langvision-0.1.37.tar.gz.
File metadata
- Download URL: langvision-0.1.37.tar.gz
- Upload date:
- Size: 119.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69dbd20049748a2718cdfd45ada789b64790766e11fc91462c0b263e704d15e
|
|
| MD5 |
3dc9295d986d208d8fa3cf5b9d506403
|
|
| BLAKE2b-256 |
7214aa2d706c7684c33f69c9ff0b8812be003cea8ce4062b5a72157234cb4f74
|
File details
Details for the file langvision-0.1.37-py3-none-any.whl.
File metadata
- Download URL: langvision-0.1.37-py3-none-any.whl
- Upload date:
- Size: 149.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e4e222fc027bc0360178bf3a500d86974c554946e17d95ad47d6d86ca26a560
|
|
| MD5 |
5c402c96746397c5c787814c318d338f
|
|
| BLAKE2b-256 |
353bd6508e04eb679a118b092dd02417b7a8992c6913f1a8221eca443e63bb57
|