Skip to main content

Train models with self-supervised learning in a single command

Project description

LightlyTrain - SOTA Pretraining, Fine-tuning and Distillation

Python Docker Documentation Discord

Train Better Models, Faster

LightlyTrain is the leading framework for transforming your data into state-of-the-art computer vision models. It covers the entire model development lifecycle from pretraining DINOv2/v3 vision foundation models on your unlabeled data to fine-tuning transformer and YOLO models on detection and segmentation tasks for edge deployment.

Contact us to request a license for commercial use.

News

  • [0.14.0] - 2026-01-19: ๐Ÿฃ PicoDet, Tiny Models, and ONNX/TensorRT FP16 Support: PicoDet object detection models for low-power embedded devices! All tasks now support tiny DINOv3 models and ONNX/TensorRT export in FP16 precision for faster inference! ๐Ÿฃ
  • [0.13.0] - 2025-12-15: ๐Ÿฅ New Tiny Object Detection Models: We release tiny DINOv3 models pretrained on COCO for object detection! ๐Ÿฅ
  • [0.12.0] - 2025-11-06: ๐Ÿ’ก New DINOv3 Object Detection: Run inference or fine-tune DINOv3 models for object detection! ๐Ÿ’ก
  • [0.11.0] - 2025-08-15: ๐Ÿš€ New DINOv3 Support: Pretrain your own model with distillation from DINOv3 weights. Or fine-tune our SOTA EoMT semantic segmentation model with a DINOv3 backbone! ๐Ÿš€
  • [0.10.0] - 2025-08-04: ๐Ÿ”ฅ Train state-of-the-art semantic segmentation models with our new DINOv2 semantic segmentation fine-tuning method! ๐Ÿ”ฅ
  • [0.9.0] - 2025-07-21: DINOv2 pretraining is now officially available!

Installation

Install LightlyTrain on Python 3.8+ for Windows, Linux or MacOS with:

pip install lightly-train

Workflows

Tasks

Object Detection

Train LTDETR detection models with DINOv2 or DINOv3 backbones.

COCO Results

Implementation Model Val mAP50:95 Latency (ms) Params (M) Input Size
LightlyTrain picodet-s-coco 26.7* 2.2* 1.17 416ร—416
LightlyTrain picodet-l-coco 32.0* 2.4* 3.75 416ร—416
LightlyTrain dinov3/vitt16-ltdetr-coco 49.8 5.4 10.1 640ร—640
LightlyTrain dinov3/vitt16plus-ltdetr-coco 52.5 7.0 18.1 640ร—640
LightlyTrain dinov3/vits16-ltdetr-coco 55.4 10.5 36.4 640ร—640
LightlyTrain dinov2/vits14-noreg-ltdetr-coco 55.7 16.9 55.3 644ร—644
LightlyTrain dinov3/convnext-tiny-ltdetr-coco 54.4 13.3 61.1 640ร—640
LightlyTrain dinov3/convnext-small-ltdetr-coco 56.9 17.7 82.7 640ร—640
LightlyTrain dinov3/convnext-base-ltdetr-coco 58.6 24.7 121.0 640ร—640
LightlyTrain dinov3/convnext-large-ltdetr-coco 60.0 42.3 230.0 640ร—640

*Picodet models are in preview and we report preliminary results.

Models are trained on the COCO 2017 dataset and evaluated on the validation set with single-scale testing. Latency is measured with TensorRT on a NVIDIA T4 GPU with batch size 1. All models are optimized using tensorrt==10.13.3.9.

Usage

Documentation Colab

import lightly_train

if __name__ == "__main__":
    # Train an object detection model with a DINOv3 backbone
    lightly_train.train_object_detection(
        out="out/my_experiment",
        model="dinov3/vitt16-ltdetr-coco",
        data={
            "path": "my_data_dir",
            "train": "images/train",
            "val": "images/val",
            "names": {
                0: "person",
                1: "bicycle",
                2: "car",
                # ...
            },
        },
    )

    # Load model and run inference
    model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
    # Or use one of the models provided by LightlyTrain
    # model = lightly_train.load_model("dinov3/vitt16-ltdetr-coco")
    results = model.predict("image.jpg")
    results["labels"]   # Class labels, tensor of shape (num_boxes,)
    results["bboxes"]   # Bounding boxes in (xmin, ymin, xmax, ymax) absolute pixel
                        # coordinates of the original image. Tensor of shape (num_boxes, 4).
    results["scores"]   # Confidence scores, tensor of shape (num_boxes,)
Panoptic Segmentation

Train state-of-the-art panoptic segmentation models with DINOv3 backbones using the EoMT method from CVPR 2025.

COCO Results

Implementation Model Val PQ Avg. Latency (ms) Params (M) Input Size
LightlyTrain dinov3/vitt16-eomt-panoptic-coco 38.0 13.5 6.0 640ร—640
LightlyTrain dinov3/vittplus16-eomt-panoptic-coco 41.4 14.1 7.7 640ร—640
LightlyTrain dinov3/vits16-eomt-panoptic-coco 46.8 21.2 23.4 640ร—640
LightlyTrain dinov3/vitb16-eomt-panoptic-coco 53.2 39.4 92.5 640ร—640
LightlyTrain dinov3/vitl16-eomt-panoptic-coco 57.0 80.1 315.1 640ร—640
LightlyTrain dinov3/vitl16-eomt-panoptic-coco-1280 59.0 500.1 315.1 1280ร—1280
EoMT (CVPR 2025 paper, current SOTA) dinov3/vitl16-eomt-panoptic-coco-1280 58.9 - 315.1 1280ร—1280

Tiny models are trained for 48 epochs, small and base models for 24 epochs and large models for 12 epochs on the COCO 2017 dataset and evaluated on the validation set with single-scale testing. Avg. Latency is measured on a single NVIDIA T4 GPU with batch size 1. All models are optimized using torch.compile.

Usage

Documentation Colab

import lightly_train

if __name__ == "__main__":
    # Train an panoptic segmentation model with a DINOv3 backbone
    lightly_train.train_panoptic_segmentation(
        out="out/my_experiment",
        model="dinov3/vitb16-eomt-panoptic-coco",
        data={
            "train": {
                "images": "images/train",
                "masks": "annotations/train",
                "annotations": "annotations/train.json",
            },
            "val": {
                "images": "images/val",
                "masks": "annotations/val",
                "annotations": "annotations/val.json",
            },
        },
    )

    model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
    results = model.predict("image.jpg")
    results["masks"]    # Masks with (class_label, segment_id) for each pixel, tensor of
                        # shape (height, width, 2). Height and width correspond to the
                        # original image size.
    results["segment_ids"]    # Segment ids, tensor of shape (num_segments,).
    results["scores"]   # Confidence scores, tensor of shape (num_segments,)
Instance Segmentation

Train state-of-the-art instance segmentation models with DINOv3 backbones using the EoMT method from CVPR 2025.

COCO Results

Implementation Model Val mAP mask Avg. Latency (ms) Params (M) Input Size
LightlyTrain dinov3/vitt16-eomt-inst-coco 25.4 12.7 6.0 640ร—640
LightlyTrain dinov3/vitt16plus-eomt-inst-coco 27.6 13.3 7.7 640ร—640
LightlyTrain dinov3/vits16-eomt-inst-coco 32.6 19.4 21.6 640ร—640
LightlyTrain dinov3/vitb16-eomt-inst-coco 40.3 39.7 85.7 640ร—640
LightlyTrain dinov3/vitl16-eomt-inst-coco 46.2 80.0 303.2 640ร—640
EoMT (CVPR 2025 paper, current SOTA) dinov3/vitl16-eomt-inst-coco 45.9 - 303.2 640ร—640

Tiny models are trained for 48 epochs, while all other models are trained for 12 epochs on the COCO 2017 dataset and evaluated on the validation set with single-scale testing. Average latency is measured on a single NVIDIA T4 GPU with batch size 1. All models are optimized using torch.compile.

Usage

Documentation Colab

import lightly_train

if __name__ == "__main__":
    # Train an instance segmentation model with a DINOv3 backbone
    lightly_train.train_instance_segmentation(
        out="out/my_experiment",
        model="dinov3/vitb16-eomt-inst-coco",
        data={
            "path": "my_data_dir",
            "train": "images/train",
            "val": "images/val",
            "names": {
                0: "background",
                1: "vehicle",
                2: "pedestrian",
                # ...
            },
        },
    )

    model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
    results = model.predict("image.jpg")
    results["labels"]   # Class labels, tensor of shape (num_instances,)
    results["masks"]    # Binary masks, tensor of shape (num_instances, height, width).
                        # Height and width correspond to the original image size.
    results["scores"]   # Confidence scores, tensor of shape (num_instances,)
Semantic Segmentation

Train state-of-the-art semantic segmentation models with DINOv2 or DINOv3 backbones using the EoMT method from CVPR 2025.

COCO-Stuff Results

Implementation Model Val mIoU Avg. Latency (ms) Params (M) Input Size
LightlyTrain dinov3/vitt32-eomt-coco 34.0 4.2 6.0 512ร—512
LightlyTrain dinov3/vitt32plus-eomt-coco 36.0 4.4 7.7 512ร—512
LightlyTrain dinov3/vits32-eomt-coco 42.4 5.4 21.6 512ร—512
LightlyTrain dinov3/vitb32-eomt-coco 48.3 9.4 85.7 512ร—512
LightlyTrain dinov3/vitl32-eomt-coco 51.2 17.5 303.2 512ร—512
LightlyTrain dinov3/vitt16-eomt-coco 37.9 6.0 6.0 512ร—512
LightlyTrain dinov3/vitt16plus-eomt-coco 39.5 6.4 7.7 512ร—512
LightlyTrain dinov3/vits16-eomt-coco 45.0 11.3 21.6 512ร—512
LightlyTrain dinov3/vitb16-eomt-coco 50.1 23.1 85.7 512ร—512
LightlyTrain dinov3/vitl16-eomt-coco 52.5 49.0 303.2 512ร—512

Models are trained for 12 epochs with num_queries=200 on the COCO-Stuff dataset and evaluated on the validation set with single-scale testing. Average latency is measured on a single NVIDIA T4 GPU with batch size 1. All models optimized using torch.compile.

Cityscapes Results

Implementation Model Val mIoU Avg. Latency (ms) Params (M) Input Size
LightlyTrain dinov3/vits16-eomt-cityscapes 78.6 53.8 21.6 1024ร—1024
LightlyTrain dinov3/vitb16-eomt-cityscapes 81.0 114.9 85.7 1024ร—1024
LightlyTrain dinov3/vitl16-eomt-cityscapes 84.4 256.4 303.2 1024ร—1024
EoMT (CVPR 2025 paper, current SOTA) dinov2/vitl16-eomt 84.2 - 319 1024ร—1024

Average latency is measured on a single NVIDIA T4 GPU with batch size 1. All models are optimized using torch.compile.

Usage

Documentation Colab

import lightly_train

if __name__ == "__main__":
    # Train a semantic segmentation model with a DINOv3 backbone
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov3/vits16-eomt",
        data={
            "train": {
                "images": "my_data_dir/train/images",
                "masks": "my_data_dir/train/masks",
            },
            "val": {
                "images": "my_data_dir/val/images",
                "masks": "my_data_dir/val/masks",
            },
            "classes": {
                0: "background",
                1: "road",
                2: "building",
                # ...
            },
        },
    )

    # Load model and run inference
    model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
    # Or use one of the models provided by LightlyTrain
    # model = lightly_train.load_model("dinov3/vits16-eomt")
    masks = model.predict("image.jpg")
    # Masks is a tensor of shape (height, width) with class labels as values.
    # It has the same height and width as the input image.
Distillation (DINOv2/v3)

Pretrain any model architecture with unlabeled data by distilling the knowledge from DINOv2 or DINOv3 foundation models into your model. On the COCO dataset, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more benchmarks and details.

Benchmark Results

Usage

Documentation Google Colab

import lightly_train

if __name__ == "__main__":
    # Distill the knowledge from a DINOv3 teacher into a YOLOv8 model
    lightly_train.pretrain(
        out="out/my_experiment",
        data="my_data_dir",
        model="ultralytics/yolov8s",
        method="distillation",
        method_args={
            "teacher": "dinov3/vitb16",
        },
    )

    # Load model for fine-tuning
    model = YOLO("out/my_experiment/exported_models/exported_last.pt")
    model.train(data="coco8.yaml")
Pretraining (DINOv2 Foundation Models)

With LightlyTrain you can train your very own foundation model like DINOv2 on your data.

ImageNet-1K Results

Implementation Model Val ImageNet k-NN
LightlyTrain dinov2/vitl16 81.9%
DINOv2 dinov2/vitl16 81.6%

Models are pretrained on ImageNet-1k for 100 epochs and evaluated with a k-NN classifier on the ImageNet validation set.

Usage

Documentation

import lightly_train

if __name__ == "__main__":
    # Pretrain a DINOv2 vision foundation model
    lightly_train.pretrain(
        out="out/my_experiment",
        data="my_data_dir",
        model="dinov2/vitb14",
        method="dinov2",
    )
Autolabeling

LightlyTrain provides simple commands to autolabel your unlabeled data using DINOv2 or DINOv3 pretrained models. This allows you to efficiently boost performance of your smaller models by leveraging all your unlabeled images.

ADE20K Results

Implementation Model Autolabel Val mIoU Params (M) Input Size
LightlyTrain dinov3/vits16-eomt โŒ 0.466 21.6 518ร—518
LightlyTrain dinov3/vits16-eomt-ade20k โœ… 0.533 21.6 518ร—518
LightlyTrain dinov3/vitb16-eomt โŒ 0.544 85.7 518ร—518
LightlyTrain dinov3/vitb16-eomt-ade20k โœ… 0.573 85.7 518ร—518

The better results with auto-labeling were achieved by fine-tuning a ViT-H+ on the ADE20K dataset, which reaches 0.595 validation mIoU. This model was then used to autolabel 100k images from the SUN397 dataset. Using these labels, we subsequently fine-tuned the smaller models, and then used the ADE20k dataset for validation.

Usage

Documentation

import lightly_train

if __name__ == "__main__":
    # Autolabel your data with a DINOv3 semantic segmentation model
    lightly_train.predict_semantic_segmentation(
        out="out/my_autolabeled_data",
        data="my_data_dir",
        model="dinov3/vitb16-eomt-coco",
        # Or use one of your own model checkpoints
        # model="out/my_experiment/exported_models/exported_best.pt",
    )

    # The autolabeled masks will be saved in this format:
    # out/my_autolabeled_data
    # โ”œโ”€โ”€ <image name>.png
    # โ”œโ”€โ”€ <image name>.png
    # โ””โ”€โ”€ โ€ฆ

Features

  • Python, Command Line, and Docker support
  • Built for high performance including multi-GPU and multi-node support
  • Monitor training progress with MLflow, TensorBoard, Weights & Biases, and more
  • Runs fully on-premises with no API authentication
  • Export models in their native format for fine-tuning or inference
  • Export models in ONNX or TensorRT format for edge deployment

Models

LightlyTrain supports the following model and workflow combinations.

Fine-tuning

Model Object
Detection
Instance
Segmentation
Panoptic
Segmentation
Semantic
Segmentation
DINOv3 โœ… ๐Ÿ”— โœ… ๐Ÿ”— โœ… ๐Ÿ”— โœ… ๐Ÿ”—
DINOv2 โœ… ๐Ÿ”— โœ… ๐Ÿ”—

Distillation & Pretraining

Model Distillation Pretraining
DINOv3 โœ… ๐Ÿ”—
DINOv2 โœ… ๐Ÿ”— โœ… ๐Ÿ”—
Torchvision ResNet, ConvNext, ShuffleNetV2 โœ… ๐Ÿ”— โœ… ๐Ÿ”—
TIMM models โœ… ๐Ÿ”— โœ… ๐Ÿ”—
Ultralytics YOLOv5โ€“YOLO12, RT-DETR โœ… ๐Ÿ”— โœ… ๐Ÿ”—
RT-DETR, RT-DETRv2 โœ… ๐Ÿ”— โœ… ๐Ÿ”—
RF-DETR โœ… ๐Ÿ”— โœ… ๐Ÿ”—
YOLOv12 โœ… ๐Ÿ”— โœ… ๐Ÿ”—
Custom PyTorch Model โœ… ๐Ÿ”— โœ… ๐Ÿ”—

Contact us if you need support for additional models.

Usage Events

LightlyTrain collects anonymous usage events to help us improve the product. We only track training method, model architecture, and system information (OS, GPU). To opt-out, set the environment variable: export LIGHTLY_TRAIN_EVENTS_DISABLED=1

License

LightlyTrain offers flexible licensing options to suit your specific needs:

  • AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.

  • Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.

  • Free Community License: Available for students, researchers, startups in early stages, or anyone exploring or experimenting with LightlyTrain. Empower the next generation of innovators with full access to the world of pretraining.

We're committed to supporting both open-source and commercial users. Contact us to discuss the best licensing option for your project!

Contact

Website
Discord
GitHub
X
YouTube
LinkedIn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightly_train-0.14.0.tar.gz (11.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightly_train-0.14.0-py3-none-any.whl (557.7 kB view details)

Uploaded Python 3

File details

Details for the file lightly_train-0.14.0.tar.gz.

File metadata

  • Download URL: lightly_train-0.14.0.tar.gz
  • Upload date:
  • Size: 11.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for lightly_train-0.14.0.tar.gz
Algorithm Hash digest
SHA256 534a49b4745937f3d3376aeb91afb6e7774be7e937d5af43698481f0c5cbb38c
MD5 3ada5b03025c1fc30fe9f89e0f216c7f
BLAKE2b-256 dbce7edf8289aa860877615a5338bda203babd26b750fe020a7c2d4c9e470d7d

See more details on using hashes here.

File details

Details for the file lightly_train-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: lightly_train-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 557.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for lightly_train-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a85dc07a13333d5c5e4c1f82f0bf0d584499d5433de106fedb661509d5587975
MD5 8d7a5be938312f10c379d9d37d71df2a
BLAKE2b-256 718990189d5854126305e761712ea57ae80f24c3b9db4132babe1e69e9246e1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page