Skip to main content

Efficient LoRA Fine-Tuning for Vision LLMs with advanced CLI and model zoo

Project description

Langvision: LoRA Fine-Tuning for Vision LLMs

Langvision Logo

Fine-tune Vision LLMs (LLaVA, Qwen-VL) in minutes

PyPI version Downloads License Python


What You'll Need

# Quick system check
python --version 

# Check GPU support (Optional but recommended)
python -c "import torch; print('GPU ready!' if torch.cuda.is_available() else 'CPU mode - still works!')"

Install LangTrain

# Step 1: Create a clean environment (recommended)
python -m venv langtrain-env
source langtrain-env/bin/activate  # Windows: langtrain-env\Scripts\activate

# Step 2: Install LangVision
pip install langvision

# Step 3: Verify it worked
python -c "import langvision; print('✅ LangVision installed!')"

Train Your First Model

from langvision import LoRATrainer

# Step 1: Define your training data (Images + QA)
training_data = [
    {
        "image": "./images/cat.jpg", 
        "question": "What is in this image?", 
        "answer": "A cute tabby cat sitting on a rug."
    },
    {
        "image": "./images/dog.jpg", 
        "question": "Describe the animal.", 
        "answer": "A golden retriever playing with a ball."
    }
]

# Step 2: Create the trainer
# Configures Vision Encoder + LLM Adapter automatically
trainer = LoRATrainer(
    model_name="llava-v1.6-7b",  # Works with LLaVA, Qwen-VL, BLIP-2 etc.
    output_dir="./my_vision_model",
)

# Step 3: Train!
trainer.train(training_data)

# Step 4: Test your model
model = trainer.load_model()
response = model.chat("./images/cat.jpg", "What do you see?")
print(f"AI: {response}")

Use Your Trained Model

from langvision import ChatModel

# Load your trained model
model = ChatModel.load("./my_vision_model")

# Analyze images
print(model.chat("image1.jpg", "Describe this scene."))

Using Your Own Data

from langvision import LoRATrainer

trainer = LoRATrainer(
    model_name="llava-v1.6-7b",
    output_dir="./custom_vlm",
)

# Method 1: Load from Hugging Face datasets
trainer.train_from_hub("your_username/your_vqa_dataset")

Next Steps

  1. Train with QLoRA: Use QLoRATrainer to fine-tune LLaVA-7B on consumer GPUs (under 12GB VRAM).
  2. Explore Model Zoo: langvision model-zoo list to see supported models (LLaVA, Qwen, CogVLM, etc.).
  3. Read the Docs: Check out langtrain.xyz/docs.

Architecture Overview

Langvision adapts Vision Transformers (ViT) and Large Language Models (LLM) using LoRA.

flowchart TD
    A(["Input Image"]) --> B(["Vision Encoder (Frozen)"])
    B --> C(["Projector"])
    C --> D(["LLM (LoRA Adapted)"])
    D --> E(["Text Output"])

Contributing

Contributions are welcome! See CONTRIBUTING.md.

License

MIT License. See LICENSE.

Citation

@software{langvision2025,
  author = {Pritesh Raj},
  title = {Langvision: Efficient LoRA Fine-Tuning for Vision LLMs},
  url = {https://github.com/langtrain-ai/langvision},
  year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langvision-0.1.37.tar.gz (119.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langvision-0.1.37-py3-none-any.whl (149.6 kB view details)

Uploaded Python 3

File details

Details for the file langvision-0.1.37.tar.gz.

File metadata

  • Download URL: langvision-0.1.37.tar.gz
  • Upload date:
  • Size: 119.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langvision-0.1.37.tar.gz
Algorithm Hash digest
SHA256 c69dbd20049748a2718cdfd45ada789b64790766e11fc91462c0b263e704d15e
MD5 3dc9295d986d208d8fa3cf5b9d506403
BLAKE2b-256 7214aa2d706c7684c33f69c9ff0b8812be003cea8ce4062b5a72157234cb4f74

See more details on using hashes here.

File details

Details for the file langvision-0.1.37-py3-none-any.whl.

File metadata

  • Download URL: langvision-0.1.37-py3-none-any.whl
  • Upload date:
  • Size: 149.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langvision-0.1.37-py3-none-any.whl
Algorithm Hash digest
SHA256 0e4e222fc027bc0360178bf3a500d86974c554946e17d95ad47d6d86ca26a560
MD5 5c402c96746397c5c787814c318d338f
BLAKE2b-256 353bd6508e04eb679a118b092dd02417b7a8992c6913f1a8221eca443e63bb57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page