GroundedDINO-VL vision-language backend
Project description
GroundedDINO-VL
Modern Vision-Language Foundation Models for PyTorch 2.7 + CUDA 12.8
Overview
GroundedDINO-VL is a modern, independent vision-language framework derived from GroundingDINO, fully refactored and maintained for current GPU infrastructure with PyTorch 2.7 and CUDA 12.8 support.
This project is now fully independent from the original GroundingDINO implementation, with a clean, modern codebase while maintaining full compatibility with GroundingDINO model weights and providing backward compatibility for existing code.
Key Features
- Fully Independent: Clean break from legacy code - single source of truth in
groundeddino_vl/ - Modern Stack: PyTorch 2.7 + CUDA 12.8 support
- Zero-Shot Detection: Detect objects using natural language descriptions
- High Performance: Compatible with GroundingDINO's COCO zero-shot 52.5 AP weights
- Backward Compatible: Legacy
groundingdinonamespace supported via compatibility shim - Clean Architecture: Refactored package structure with better organization
- Label Studio Integration: Real-time ML backend for auto-annotation workflows
- 95% Package Size Reduction: Eliminated duplicate code (9,000+ lines removed)
Example Results
Documentation Index
Getting Started
- Installation Guide - System requirements, installation methods, and verification
- Quick Start Guide - Basic usage examples and common workflows
- API Reference - Complete API documentation with examples
- Batch Inference - Process thousands of images with COCO/YOLO/Label Studio export
Advanced Topics
- Label Studio Integration - Auto-annotation and ML backend setup
- Building from Source - Detailed compilation and build instructions
- Project Structure - Codebase organization and architecture
Migration & Compatibility
- GroundingDINO Migration Guide - Migrating from legacy
groundingdinonamespace - API Migration Guide - Upgrading from previous versions
- Independence Note: GroundedDINO-VL is now fully independent with all code in
groundeddino_vl/. Thegroundingdino/namespace provides backward compatibility via a lightweight shim.
Integration & Deployment
- Testing & Validation - Test suite, CI/CD, and quality assurance
- Security Best Practices - Security guidelines and considerations
Contributing & Support
- Contributing Guidelines - How to contribute to the project
- Changelog - Version history and release notes
- Troubleshooting - Common issues and solutions
Quick Installation
Via PyPI (Recommended)
pip install groundeddino_vl
With GPU Support (CUDA 12.8)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install groundeddino_vl
From Source
git clone https://github.com/ghostcipher1/GroundedDINO-VL.git
cd GroundedDINO-VL
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -e .
See Installation Guide for detailed instructions and system requirements.
Quick Start Example
from groundeddino_vl import load_model, predict, annotate
# Load model (auto-downloads weights on first run)
model = load_model(
config_path="path/to/config.py",
checkpoint_path="path/to/weights.pth",
device="cuda"
)
# Run detection with text prompt
result = predict(
model=model,
image="path/to/image.jpg",
text_prompt="car . person . dog",
box_threshold=0.35,
text_threshold=0.25,
)
# Visualize results
annotated_image = annotate(image, result, show_labels=True)
See Quick Start Guide for more examples and usage patterns.
Label Studio Integration
GroundedDINO-VL includes an optional Label Studio ML Backend for real-time auto-annotation:
- On-demand inference via FastAPI service
- Auto-labeling with the "magic wand" feature
- Batch annotation assistance
- PostgreSQL/SQLite history logging
Complete setup guide: Label Studio Integration Documentation
Research Foundation
This project is based on the groundbreaking work:
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
@article{liu2023grounding,
title={Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection},
author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},
journal={arXiv preprint arXiv:2303.05499},
year={2023}
}
Original Project: IDEA-Research/GroundingDINO Paper: arXiv:2303.05499
Research Benchmarks
System Requirements
| Component | Requirement |
|---|---|
| Python | 3.9, 3.10, 3.11, or 3.12 |
| PyTorch | 2.7.0+ |
| CUDA (optional) | 12.6 or 12.8 |
| C++ Compiler | GCC 7+, Clang 5+, or MSVC 2019+ |
| GPU (optional) | NVIDIA with Compute Capability 6.0+ |
License
Copyright (c) 2025 GhostCipher. All rights reserved.
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Original GroundingDINO License
Copyright (c) 2023 IDEA. All Rights Reserved.
Licensed under the Apache License, Version 2.0
This project maintains the original Apache 2.0 license and properly attributes the original GroundingDINO research team.
Acknowledgments
- GroundingDINO Team at IDEA Research for the original research and implementation
- Deformable DETR for the multi-scale deformable attention mechanism
- DINO for the transformer-based detection architecture
- PyTorch Team for the excellent deep learning framework
Links
- Homepage: github.com/ghostcipher1/GroundedDINO-VL
- PyPI: pypi.org/project/groundeddino_vl
- Original GroundingDINO: github.com/IDEA-Research/GroundingDINO
- Issues: github.com/ghostcipher1/GroundedDINO-VL/issues
Built with ❤️ for the computer vision community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file groundeddino_vl-1.1.0.tar.gz.
File metadata
- Download URL: groundeddino_vl-1.1.0.tar.gz
- Upload date:
- Size: 120.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ef45b4af058b8fa213d7ac17d03d95236ad4d187976524b963a23f3dbd0961a
|
|
| MD5 |
29e053d01bd591e679e016925af5681c
|
|
| BLAKE2b-256 |
3f9b394ac0990dcea6bac42b419f2ae6151b3bd43311c5e91da71b6867698604
|
File details
Details for the file groundeddino_vl-1.1.0-py3-none-any.whl.
File metadata
- Download URL: groundeddino_vl-1.1.0-py3-none-any.whl
- Upload date:
- Size: 133.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bc7b09afd1a4652b0a82638bf116762a9cbe10d1cc4324fc043dc556ec25816
|
|
| MD5 |
5e230662de0c1dddcefd57f0ad3213d6
|
|
| BLAKE2b-256 |
700b06e406ba4254f3e3a9bd15ea8ecc08ced4ada142c40d13be5241a442abd2
|