Skip to main content

FlashDet: Ultra-lightweight real-time object detection with LoRA fine-tuning and Knowledge Distillation

Project description

FlashDet Logo

FlashDet

PyPI CI PyTorch Python ONNX LoRA License

Ultra-lightweight real-time object detection with advanced training methods, LoRA fine-tuning, tracking, and analytics

InstallArchitecturesUsageTraining MethodsSolutionsTrackersStructureContributing


What is FlashDet?

FlashDet is an end-to-end object detection framework built for speed, accuracy, and extensibility. The core FlashDet model features a dual detection head (NMS-free one-to-one + dense one-to-many), STAL (Small-Target-Aware Label Assignment), ProgLoss (Progressive Loss Balancing), and the MuSGD (Muon+SGD hybrid) optimizer.

The framework supports 6 training methods — all through a unified, registry-based, pluggable design.

Training Pipeline:
  Dataset → Augmentation → FlashDet Model
    ├── Classification Loss (BCE)
    ├── Box Loss (CIoU + L1, ProgLoss weighted)
    └── STAL Assignment
        → MuSGD → Updated Weights

Model Sizes

Model Backbone Params (Inference) FP16 Size Notes
FlashDet-P (Pico) LiteBackbone-0.5x / PicoBackbone + PicoNeck ~298K 0.57 MB Sub-1MB, depthwise heads
FlashDet-N (Nano) FlashBackbone (w=0.25, d=0.33) ~1.06M 2.01 MB Lightweight
FlashDet-S (Small) FlashBackbone (w=0.50, d=0.33) ~5.4M 10.3 MB Balanced
FlashDet-M (Medium) FlashBackbone (w=1.00, d=0.67) ~18M 34.3 MB High accuracy

FlashDet-P (Pico) is designed for extreme edge deployment (microcontrollers, mobile, browser). It uses:

  • LiteBackbone-0.5x with pretrained weights (channel mixing + depthwise convolutions)
  • PicoNeck with 64-ch output (lightweight modules for efficient feature generation)
  • Depthwise-separable E2E dual head (DW-conv + pointwise instead of full convolutions)
  • Same STAL + ProgLoss training recipe as larger variants

Installation

pip (recommended)

pip install flashdet

# With all extras (tracking, analytics, ONNX export)
pip install "flashdet[all]"

From source (for development)

git clone https://github.com/FlashVision/FlashDet.git
cd FlashDet
pip install -e ".[all]"

Optional extras

pip install -e ".[export]"      # ONNX export support
pip install -e ".[tracker]"     # FlashTracker, MotionTracker, AppearanceTracker
pip install -e ".[solutions]"   # Counting, speed, heatmaps
pip install -e ".[analytics]"   # Benchmarking, plots
pip install -e ".[all]"         # Everything

Verify installation

flashdet check       # runs full health check
flashdet settings    # shows Python, PyTorch, CUDA, GPU info
flashdet version     # prints version

Usage

Python API

from flashdet import FlashDet, Trainer

# Build sub-1MB Pico model for edge deployment
pico = FlashDet(num_classes=80, size="p")
print(pico.get_model_info())  # inference_fp16_mb: 0.57

# Build with reparameterizable backbone
pico_v2 = FlashDet(num_classes=80, size="p", backbone_type="repnext")

# Build larger model
model_n = FlashDet(num_classes=80, size="n")

# Train
trainer = Trainer(
    model_size="p",   # "p" (Pico), "n", "s", "m", "l", "x"
    train_images="data/train",
    val_images="data/val",
    epochs=100,
    device="cuda",
)
trainer.train()

CLI

# Train (use --model-size p for Pico, n for Nano, s for Small, etc.)
flashdet train --model-size p --epochs 100 --device cuda \
  --train-images data/train --val-images data/val

# Validate
flashdet val --model best.pth --val-images data/val

Standalone Scripts

# Full training with LoRA
python train.py --lora --lora-rank 8 --epochs 50 --device cuda

# Inference
python test.py --model best.pth --image photo.jpg

Training Methods

FlashDet supports 5 training paradigms, each with a dedicated trainer class and CLI script:

Method Trainer Class CLI Script Description
Standard Trainer train.py Full supervised training with all augmentations
Self-Supervised (SSL) SSLTrainer scripts/train_ssl.py BYOL pretraining on unlabeled data
Semi-Supervised SemiSupervisedTrainer scripts/train_semi_supervised.py Teacher-student with pseudo-labels
Few-Shot FewShotTrainer scripts/train_few_shot.py Learn from very few labeled examples
Active Learning ActiveLearningTrainer scripts/train_active_learning.py Intelligently select samples for labeling

Self-Supervised Pretraining

python scripts/train_ssl.py \
  --method byol \
  --data-dir path/to/unlabeled/images \
  --epochs 100 --backbone-size n

Semi-Supervised Learning

python scripts/train_semi_supervised.py \
  --train-images data/train \
  --unlabeled-dir path/to/unlabeled/images \
  --pseudo-threshold 0.7

Few-Shot Learning

python scripts/train_few_shot.py \
  --base-checkpoint path/to/base.pth \
  --n-shot 10 --freeze-backbone

Active Learning

python scripts/train_active_learning.py \
  --train-images data/train \
  --unlabeled-pool path/to/unlabeled/images \
  --query-strategy entropy --budget 50 --rounds 5

LoRA / QLoRA Fine-Tuning

Parameter-efficient — freeze backbone, train only low-rank adapters:

# LoRA (6 variants: standard, dora, lora_plus, adalora, ortho, lora_fa)
python train.py --lora --lora-variant dora --lora-rank 8 --lora-alpha 16

# QLoRA (quantized base weights + LoRA)
python train.py --qlora --qlora-dtype nf4 --lora-rank 8

Mixed Precision & Multi-GPU

python train.py --amp --multi-gpu --device cuda

Core Components

STAL (Small-Target-Aware Label Assignment)

Task-Aligned Assignment with small-target protection — temporarily expands tiny GT boxes during candidate selection so small objects always get positive anchor supervision.

ProgLoss (Progressive Loss Balancing)

Linearly shifts training emphasis from the dense one-to-many head (exploration) to the NMS-free one-to-one head (refinement) over the course of training: alpha(t): 1.0 → 0.0.

MuSGD (Muon + SGD Hybrid Optimizer)

Applies Muon-style orthogonal updates to multi-dimensional parameters (conv weights, attention) while using standard SGD for 1D parameters (biases, norms), combining faster convergence with training stability.

E2E Detection Loss

Combines CIoU box loss, BCE classification loss, and L1 regression loss across both dual heads, weighted by the ProgLoss schedule.


Solutions

Built-in high-level applications for real-world use cases:

from flashdet.solutions import ObjectCounter, SpeedEstimator, Heatmap
from flashdet.trackers import FlashTracker

tracker = FlashTracker()
# Solutions integrate with any detection model for real-world applications
Solution Description
ObjectCounter Count objects crossing lines or entering regions
SpeedEstimator Estimate real-world speed from tracked objects
Heatmap Visualize detection density over time
RegionCounter Count objects in polygon zones
QueueManager Monitor queue lengths and wait times
DistanceCalculator Measure real-world distances between objects
ParkingManager Track parking spot occupancy
SecurityAlarm Alert on intrusions into restricted zones
WorkoutMonitor Track exercise repetitions and form
LiveInference Real-time webcam/stream detection
AnalyticsDashboard Aggregated detection statistics and visualization

Trackers

Multi-object tracking with persistent IDs across frames:

from flashdet.trackers import FlashTracker, MotionTracker, AppearanceTracker

tracker = FlashTracker(max_age=30, min_hits=3, iou_threshold=0.3)
tracks = tracker.update(detections)  # [x1,y1,x2,y2,track_id,score,cls]
Tracker Method Best For
FlashTracker IoU + Kalman filter General purpose, fast
MotionTracker Kalman + Hungarian matching Speed-critical applications
AppearanceTracker Appearance + motion fusion Crowded scenes, re-identification

Analytics

from flashdet.analytics import Benchmark, Profiler

bench = Benchmark(model_path="best.pth", device="cuda")
results = bench.run()  # {'fps': ..., 'latency_ms': ..., 'params': ..., ...}

profiler = Profiler(model_path="best.pth")
profiler.run()  # prints per-layer timing breakdown

Training Callbacks

Extend the training loop without modifying source code:

from flashdet import Trainer
from flashdet.engine.core.callbacks import EarlyStopping, CSVLogger, TensorBoardCallback

trainer = Trainer(model_size="n", train_images="data/train", val_images="data/val")

trainer.add_callback(EarlyStopping(patience=20, metric="val_mAP"))
trainer.add_callback(CSVLogger("metrics.csv"))
trainer.add_callback(TensorBoardCallback("runs/exp1"))

trainer.train()

Built-in callbacks: EarlyStopping, CSVLogger, TensorBoardCallback, LRSchedulerCallback.


Registry System

FlashDet uses a pluggable registry for all major components. Adding a new architecture, backbone, head, or loss is as simple as decorating your class:

from flashdet.registry import DETECTORS, BACKBONES, HEADS

@DETECTORS.register("MyDetector")
class MyDetector(nn.Module):
    ...

# Later, build from config
model = DETECTORS.build("MyDetector", num_classes=80)

Available registries: DETECTORS, BACKBONES, NECKS, HEADS, LOSSES, DATASETS, TRANSFORMS, TRACKERS.

Examples

Ready-to-run scripts in examples/:

Script What it does
train_custom_dataset.py Train on your own COCO-format dataset
train_with_lora.py LoRA fine-tuning (DoRA variant)
cd examples
python train_custom_dataset.py

Project Structure

FlashDet/
├── flashdet/                        # Main package
│   ├── __init__.py                  # Public API
│   ├── cli.py                       # CLI entry point
│   ├── registry.py                  # Pluggable component registry
│   ├── cfg/                         # Configuration
│   ├── data/                        # Datasets, loaders, transforms, download
│   ├── engine/
│   │   ├── core/                    # Callbacks, EMA, MuSGD optimizer
│   │   ├── training/                # All training paradigms
│   │   │   ├── trainer.py           # Standard Trainer
│   │   │   ├── kd_trainer.py        # Knowledge Distillation
│   │   │   ├── ssl_trainer.py       # Self-Supervised Learning
│   │   │   ├── semi_supervised_trainer.py
│   │   │   ├── few_shot_trainer.py
│   │   │   └── active_learning_trainer.py
│   │   └── evaluation/              # Validator
│   ├── models/
│   │   ├── architectures/
│   │   │   └── flashdet.py          # FlashDet + FlashDetPico
│   │   ├── backbone/                # LiteBackbone, PicoBackbone, FlashBackbone
│   │   ├── neck/                    # PicoNeck, YOLO necks
│   │   ├── head/                    # E2E dual detection head
│   │   ├── layers/                  # ConvBlock, PicoBlock, SpatialPool, RepNeXt blocks
│   │   ├── assignment/              # STAL
│   │   ├── detector.py              # build_model() factory
│   │   └── lora.py                  # LoRA / QLoRA (6 variants)
│   ├── losses/
│   │   ├── e2e_loss.py              # E2E dual-head loss + ProgLoss
│   │   └── kd_loss.py               # Knowledge distillation losses
│   ├── utils/                       # Metrics, visualization, checkpoints
│   ├── trackers/                    # SORT, ByteTrack, BoT-SORT, DeepSORT, OC-SORT, StrongSORT
│   ├── solutions/                   # 17 ready-to-use vision solutions
│   └── analytics/                   # Benchmark, profiling, plots
├── scripts/                         # Training scripts (SSL, few-shot, etc.)
├── examples/                        # Ready-to-run example scripts
├── tests/                           # Unit & integration tests (pytest)
├── docs/                            # Documentation
├── docker/                          # Dockerfile + docker-compose
├── train.py                         # Main training entry point
├── test.py                          # Main inference entry point
└── pyproject.toml                   # Package configuration

Docker

# Build
docker build -t flashdet -f docker/Dockerfile .

# Run inference
docker run --gpus all -v $(pwd)/data:/app/data flashdet \
  predict --model best.pth --source data/test.jpg

# Or use docker-compose
cd docker && docker compose up

Supported Formats

Import Export
COCO JSON ONNX
TXT labels FP16 weights
Pascal VOC XML TorchScript

Documentation

Full documentation is in the docs/ folder:

Document Description
Installation Detailed installation guide
LoRA Fine-Tuning LoRA/QLoRA variants and usage
Trackers Multi-object tracking guide
FAQ Frequently asked questions
Changelog Version history

Contributing

We welcome contributions!

git clone https://github.com/FlashVision/FlashDet.git
cd FlashDet
pip install -e ".[dev,all]"
pytest tests/
ruff check flashdet/
flashdet check

License

MIT License — see LICENSE for details.


FlashVision — Open-source lightweight AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flashdet-1.2.0.tar.gz (181.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flashdet-1.2.0-py3-none-any.whl (248.4 kB view details)

Uploaded Python 3

File details

Details for the file flashdet-1.2.0.tar.gz.

File metadata

  • Download URL: flashdet-1.2.0.tar.gz
  • Upload date:
  • Size: 181.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flashdet-1.2.0.tar.gz
Algorithm Hash digest
SHA256 3cc3e6f29e85d86b51b70848dda4de42be9a09e371e892a96c932f3e202eb1d4
MD5 10641a5ee51dd78f2adb57c4f453d0a4
BLAKE2b-256 3969fcd5013e9e896299de267289c709a95258bcff2f33b9d56825311d461285

See more details on using hashes here.

File details

Details for the file flashdet-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: flashdet-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 248.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for flashdet-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba4826e15b774a71598e4ed86da9c266db6b3eb895aa00726b885fa37788a5c6
MD5 35cd29c5a270f1b94a99b1db2095e342
BLAKE2b-256 2a504a6e451f3de94b9326e4712056c5b15325e4129127f7d474121082cc6334

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page