Skip to main content

Efficient LoRA Fine-Tuning for Vision LLMs with advanced CLI and model zoo

Project description

Langvision: LoRA Fine-Tuning for Vision LLMs

Langvision Logo

Fine-tune Vision LLMs (LLaVA, Qwen-VL) in minutes

PyPI version Downloads License Python


What You'll Need

# Quick system check
python --version 

# Check GPU support (Optional but recommended)
python -c "import torch; print('GPU ready!' if torch.cuda.is_available() else 'CPU mode - still works!')"

Install LangTrain

# Step 1: Create a clean environment (recommended)
python -m venv langtrain-env
source langtrain-env/bin/activate  # Windows: langtrain-env\Scripts\activate

# Step 2: Install LangVision
pip install langvision

# Step 3: Verify it worked
python -c "import langvision; print('✅ LangVision installed!')"

Train Your First Model

from langvision import LoRATrainer

# Step 1: Define your training data (Images + QA)
training_data = [
    {
        "image": "./images/cat.jpg", 
        "question": "What is in this image?", 
        "answer": "A cute tabby cat sitting on a rug."
    },
    {
        "image": "./images/dog.jpg", 
        "question": "Describe the animal.", 
        "answer": "A golden retriever playing with a ball."
    }
]

# Step 2: Create the trainer
# Configures Vision Encoder + LLM Adapter automatically
trainer = LoRATrainer(
    model_name="llava-v1.6-7b",  # Works with LLaVA, Qwen-VL, BLIP-2 etc.
    output_dir="./my_vision_model",
)

# Step 3: Train!
trainer.train(training_data)

# Step 4: Test your model
model = trainer.load_model()
response = model.chat("./images/cat.jpg", "What do you see?")
print(f"AI: {response}")

Use Your Trained Model

from langvision import ChatModel

# Load your trained model
model = ChatModel.load("./my_vision_model")

# Analyze images
print(model.chat("image1.jpg", "Describe this scene."))

Using Your Own Data

from langvision import LoRATrainer

trainer = LoRATrainer(
    model_name="llava-v1.6-7b",
    output_dir="./custom_vlm",
)

# Method 1: Load from Hugging Face datasets
trainer.train_from_hub("your_username/your_vqa_dataset")

Next Steps

  1. Train with QLoRA: Use QLoRATrainer to fine-tune LLaVA-7B on consumer GPUs (under 12GB VRAM).
  2. Explore Model Zoo: langvision model-zoo list to see supported models (LLaVA, Qwen, CogVLM, etc.).
  3. Read the Docs: Check out langtrain.xyz/docs.

Architecture Overview

Langvision adapts Vision Transformers (ViT) and Large Language Models (LLM) using LoRA.

flowchart TD
    A(["Input Image"]) --> B(["Vision Encoder (Frozen)"])
    B --> C(["Projector"])
    C --> D(["LLM (LoRA Adapted)"])
    D --> E(["Text Output"])

Contributing

Contributions are welcome! See CONTRIBUTING.md.

License

MIT License. See LICENSE.

Citation

@software{langvision2025,
  author = {Pritesh Raj},
  title = {Langvision: Efficient LoRA Fine-Tuning for Vision LLMs},
  url = {https://github.com/langtrain-ai/langvision},
  year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langvision-0.1.38.tar.gz (119.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langvision-0.1.38-py3-none-any.whl (149.6 kB view details)

Uploaded Python 3

File details

Details for the file langvision-0.1.38.tar.gz.

File metadata

  • Download URL: langvision-0.1.38.tar.gz
  • Upload date:
  • Size: 119.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langvision-0.1.38.tar.gz
Algorithm Hash digest
SHA256 3a97b7774823c47da20c12237332626615994a6ec438d3e5cc5d347d0fdaf268
MD5 0a533a4ef83bf49188164877746330fc
BLAKE2b-256 6789214c46c76d501e0bf1a3284d0c7003edd9faadc8dcb9deba4f843c342501

See more details on using hashes here.

File details

Details for the file langvision-0.1.38-py3-none-any.whl.

File metadata

  • Download URL: langvision-0.1.38-py3-none-any.whl
  • Upload date:
  • Size: 149.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langvision-0.1.38-py3-none-any.whl
Algorithm Hash digest
SHA256 2efdc8d42318807859b34590a0de5e2cac01b74356c428c1633389145736bbc5
MD5 2aa52f65aae4e0283dddf3aa57c092b3
BLAKE2b-256 d61e83168d85e62345aeec00f0ee385fdf48b362699babd24e95436504293d93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page