A package for finetuning vision models.

These details have not been verified by PyPI

Project links

Project description

Langvision: Vision LLMs with Efficient LoRA Fine-Tuning

Langvision Logo

Langvision provides modular components for vision models and LoRA-based fine-tuning.
Adapt and fine-tune vision models for a range of tasks.

Features

LoRA adapters for parameter-efficient fine-tuning
Modular Vision Transformer (ViT) backbone
Model zoo for open-source visual models
Configurable and extensible codebase
Checkpointing and resume support
Mixed precision and distributed training
Built-in metrics and visualization tools
CLI for fine-tuning and evaluation
Extensible callbacks (early stopping, logging, etc.)

Showcase

Langvision is a framework for building and fine-tuning vision models with LoRA support. It is suitable for tasks such as image classification, visual question answering, and custom vision applications.

Getting Started

Install with pip:

pip install langvision

Minimal example:

import torch
from langvision.models.vision_transformer import VisionTransformer
from langvision.utils.config import default_config

x = torch.randn(2, 3, 224, 224)
model = VisionTransformer(
    img_size=default_config['img_size'],
    patch_size=default_config['patch_size'],
    in_chans=default_config['in_chans'],
    num_classes=default_config['num_classes'],
    embed_dim=default_config['embed_dim'],
    depth=default_config['depth'],
    num_heads=default_config['num_heads'],
    mlp_ratio=default_config['mlp_ratio'],
    lora_config=default_config['lora'],
)

with torch.no_grad():
    out = model(x)
    print('Output shape:', out.shape)

For more details, see the Documentation and src/langvision/cli/finetune.py.

Supported Python Versions

Python 3.8+

Why langvision?

Parameter-efficient fine-tuning with LoRA adapters
Modular ViT backbone for flexible model design
Unified interface for open-source vision models
Designed for both research and production
Efficient memory usage for large models

Architecture Overview

Langvision uses a modular Vision Transformer backbone with LoRA adapters in attention and MLP layers. This allows adaptation of pre-trained models with fewer trainable parameters.

Model Data Flow

---
config:
  layout: dagre
---
flowchart TD
 subgraph LoRA_Adapters["LoRA Adapters in Attention and MLP"]
        LA1(["LoRA Adapter 1"])
        LA2(["LoRA Adapter 2"])
        LA3(["LoRA Adapter N"])
  end
    A(["Input Image"]) --> B(["Patch Embedding"])
    B --> C(["CLS Token & Positional Encoding"])
    C --> D1(["Encoder Layer 1"])
    D1 --> D2(["Encoder Layer 2"])
    D2 --> D3(["Encoder Layer N"])
    D3 --> E(["LayerNorm"])
    E --> F(["MLP Head"])
    F --> G(["Output Class Logits"])
    LA1 -.-> D1
    LA2 -.-> D2
    LA3 -.-> D3
     LA1:::loraStyle
     LA2:::loraStyle
     LA3:::loraStyle
    classDef loraStyle fill:#e1f5fe,stroke:#0277bd,stroke-width:2px

Core Modules

Module	Description	Key Features
PatchEmbedding	Image-to-patch conversion and embedding	Configurable patch sizes, position embeddings
TransformerEncoder	Multi-layer transformer backbone	Self-attention, LoRA integration, checkpointing
LoRALinear	Low-rank adaptation layers	Configurable rank, memory-efficient updates
MLPHead	Output projection layer	Classification, regression, dropout
Config System	Centralized configuration	YAML/JSON config, CLI overrides
Data Utils	Preprocessing and augmentation	Built-in transforms, custom loaders

Performance & Efficiency

Metric	Full Fine-tuning	LoRA Fine-tuning	Improvement
Trainable Parameters	86M	2.4M	97% reduction
Memory Usage	12GB	4GB	67% reduction
Training Time	4h	1.5h	62% faster
Storage per Task	344MB	9.6MB	97% smaller

Benchmarks: ViT-Base, CIFAR-100, RTX 3090

Supported model sizes: ViT-Tiny, ViT-Small, ViT-Base, ViT-Large

Advanced Configuration

Example LoRA config:

lora_config = {
    "rank": 16,
    "alpha": 32,
    "dropout": 0.1,
    "target_modules": ["attention.qkv", "attention.proj", "mlp.fc1", "mlp.fc2"],
    "merge_weights": False
}

Example training config:

model:
  name: "vit_base"
  img_size: 224
  patch_size: 16
  num_classes: 1000
training:
  epochs: 10
  batch_size: 32
  learning_rate: 1e-4
  weight_decay: 0.01
  warmup_steps: 1000
lora:
  rank: 16
  alpha: 32
  dropout: 0.1

Documentation & Resources

Research Papers

Testing & Quality

Run tests:

pytest tests/

Code quality tools:

flake8 src/
black src/ --check
mypy src/
bandit -r src/

Examples & Use Cases

Image classification:

from langvision import VisionTransformer
from langvision.datasets import CIFAR10Dataset

model = VisionTransformer.from_pretrained("vit_base_patch16_224")
dataset = CIFAR10Dataset(train=True, transform=model.default_transform)
model.finetune(dataset, epochs=10, lora_rank=16)

Custom dataset:

from langvision.datasets import ImageFolderDataset

dataset = ImageFolderDataset(
    root="/path/to/dataset",
    split="train",
    transform=model.default_transform
)
model.finetune(dataset, config_path="configs/custom_config.yaml")

Extending the Framework

Add datasets in src/langvision/data/datasets.py
Add callbacks in src/langvision/callbacks/
Add models in src/langvision/models/
Add CLI tools in src/langvision/cli/

Documentation

See code comments and docstrings for details.
For advanced usage, see src/langvision/cli/finetune.py.

Contributing

We welcome contributions. See the Contributing Guide for details.

License & Citation

This project is licensed under the MIT License. See LICENSE for details.

If you use langvision in your research, please cite:

@software{langtrain2025,
  author = {Pritesh Raj},
  title = {langtrain: Vision LLMs with Efficient LoRA Fine-Tuning},
  url = {https://github.com/langtrain-ai/langvision},
  year = {2025},
  version = {1.0.0}
}

Acknowledgements

We thank the following projects and communities:

Made in India 🇮🇳 with ❤️ by the langtrain team
Star ⭐ this repo if you find it useful!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.58

May 17, 2026

0.1.57

Feb 24, 2026

0.1.56

Feb 18, 2026

0.1.55

Feb 18, 2026

0.1.54

Feb 18, 2026

0.1.53

Feb 16, 2026

0.1.52

Feb 16, 2026

0.1.51

Jan 10, 2026

0.1.50

Jan 10, 2026

0.1.49

Jan 10, 2026

0.1.48

Jan 10, 2026

0.1.47

Jan 10, 2026

0.1.46

Jan 10, 2026

0.1.45

Jan 10, 2026

0.1.44

Jan 10, 2026

0.1.43

Jan 10, 2026

0.1.42

Jan 4, 2026

0.1.41

Jan 4, 2026

0.1.40

Jan 4, 2026

0.1.39

Jan 4, 2026

0.1.38

Jan 4, 2026

0.1.37

Jan 4, 2026

0.1.0

Sep 22, 2025

This version

0.0.2

Jul 3, 2025

0.0.1

Jul 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langvision-0.0.2.tar.gz (22.2 kB view details)

Uploaded Jul 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langvision-0.0.2-py3-none-any.whl (23.4 kB view details)

Uploaded Jul 3, 2025 Python 3

File details

Details for the file langvision-0.0.2.tar.gz.

File metadata

Download URL: langvision-0.0.2.tar.gz
Upload date: Jul 3, 2025
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for langvision-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`bf6b887988aff532ac542c3ddbc8380cde1228d8d832e9dafcf84c79bc7a6cda`
MD5	`771e3fb33f83561123606c5be65c8a94`
BLAKE2b-256	`0e647440a2018039d662c9b130497aa3be8467d15deeb2874624e0428bdd07f5`

See more details on using hashes here.

File details

Details for the file langvision-0.0.2-py3-none-any.whl.

File metadata

Download URL: langvision-0.0.2-py3-none-any.whl
Upload date: Jul 3, 2025
Size: 23.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for langvision-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d2a08c7470ce11cc027bf9b66e427a1efa33810895e41026e03a25ddff578a0`
MD5	`2cefbbd647683f86364daf22d31409aa`
BLAKE2b-256	`4d8ae179208e11b4119d4ad85b8426f2c8a8925a69ae2af6bddd290a00cc22bb`

See more details on using hashes here.

langvision 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Langvision: Vision LLMs with Efficient LoRA Fine-Tuning

Quick Links

Table of Contents

Features

Showcase

Getting Started

Supported Python Versions

Why langvision?

Architecture Overview

Model Data Flow

Core Modules

Performance & Efficiency

Advanced Configuration

Documentation & Resources

Research Papers

Testing & Quality

Examples & Use Cases

Extending the Framework

Documentation

Contributing

License & Citation

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes