Cloud-native computer vision model training toolkit for Aegis AI
Project description
Aegis Vision
Cloud-native computer vision model training toolkit for Aegis AI
Overview
Aegis Vision is a streamlined toolkit for training computer vision models in cloud environments (Kaggle, Colab, etc.) with built-in support for:
- 🎯 YOLO Models (v8, v9, v10, v11) - Object detection training
- 📊 Wandb Integration - Experiment tracking and visualization
- 🔄 COCO Format - Dataset conversion and handling
- ☁️ Cloud-Optimized - Designed for Kaggle/Colab workflows
- 📦 Model Export - ONNX, CoreML, OpenVINO, TensorRT, TFLite
Installation
Standard Installation
# Basic installation
pip install aegis-vision
# With Kaggle support
pip install aegis-vision[kaggle]
# Development installation
pip install aegis-vision[dev]
# All features
pip install aegis-vision[all]
Headless Environments (Docker, CI/CD)
The package uses opencv-python-headless by default, which works in both GUI and headless environments:
# Standard installation (works in all environments)
pip install aegis-vision
No special configuration needed - the package automatically works in Docker containers, CI/CD systems, and GUI environments.
Nvidia DGX / High-Performance GPU Systems
For Nvidia DGX Spark or other systems with latest NVIDIA GPUs (Blackwell architecture), installation is the same:
# Standard installation with automatic environment checking
pip install aegis-vision
# Login and start (agent will auto-check and fix environment)
aegis-agent login
aegis-agent start
The agent automatically:
- Detects environment issues (NumPy, PyTorch compatibility)
- Explains what's wrong and why
- Offers one-click fixes
- Starts agent after fixes
See QUICKSTART_DGX.txt for detailed guide.
GPU Detection & Support
The Aegis Vision training agent includes comprehensive GPU detection that supports modern NVIDIA architectures:
Supported GPU Architectures
| Architecture | Compute Capability | GPU Examples | Status |
|---|---|---|---|
| Volta | 7.0 | V100, Titan V | ✓ Supported |
| Turing | 7.5 | RTX 2080, RTX 2090, Titan RTX | ✓ Supported |
| Ampere | 8.0-8.6 | A100, RTX 3090, RTX 3080 | ✓ Supported |
| Ada | 8.9 | RTX 4090, RTX 4080, L40S | ✓ Supported |
| Hopper | 9.0-9.1 | H100, H200 | ✓ Supported |
| Blackwell | 9.2 | B100, B200 | ✓ Supported |
GPU Detection Methods
The agent uses a dual-detection approach for maximum reliability:
-
PyTorch Detection (Primary)
- Queries
torch.cuda.is_available()and device properties - Provides compute capability and device memory
- Fastest detection method
- Queries
-
nvidia-smi Fallback (Secondary)
- Runs
nvidia-smicommand for GPU discovery - Detects GPUs even if PyTorch CUDA runtime is unavailable
- Captures NVIDIA driver version and CUDA version from
nvcc - Handles edge cases: PyTorch built without CUDA, mismatched CUDA versions, etc.
- Runs
Check GPU Detection
# Show detailed GPU information
aegis-agent info
# Example output with H100:
# 🎮 GPU Information:
# Detection Method: PyTorch
# CUDA Version: 12.1
# Driver Version: 550.120
# GPU 0:
# Name: NVIDIA H100 80GB
# Memory: 80.0 GB
# Compute Capability: 9.0
CUDA Architecture Auto-Configuration
The agent automatically configures optimal CUDA architectures for training:
TORCH_CUDA_ARCH_LIST=7.0 7.5 8.0 8.6 8.9 9.0 9.1 9.2+PTX
This includes:
- PTX flag for forward compatibility with future GPU architectures
- All major consumer and data center GPUs
- Optimal compilation for the target system
Custom architectures can be set via environment variable:
# Force specific GPU architecture
export TORCH_CUDA_ARCH_LIST="9.0 9.2+PTX"
aegis-agent start
Troubleshooting GPU Detection
If GPU is not detected despite having NVIDIA GPUs installed:
# 1. Verify NVIDIA driver is installed
nvidia-smi
# 2. Check CUDA version
nvcc --version
# 3. View detailed system info
aegis-agent info
# 4. Check CUDA compatibility
aegis-agent check-env
If detection shows "CPU Only" but GPU is available:
- The PyTorch in the environment may have been built without CUDA support
- The agent will automatically use the
nvidia-smifallback method - Check driver and CUDA toolkit compatibility with your PyTorch version
Quick Start
Training a YOLO Model
from aegis_vision import YOLOTrainer
# Initialize trainer
trainer = YOLOTrainer(
model_variant="yolov11l",
dataset_path="/kaggle/input/my-dataset",
epochs=100,
batch_size=16,
)
# Configure Wandb tracking (optional)
trainer.setup_wandb(
project="my-project",
entity="my-team",
api_key="your-api-key"
)
# Train
results = trainer.train()
# Export to multiple formats
trainer.export(formats=["onnx", "coreml", "openvino"])
Converting COCO to YOLO Format
from aegis_vision import COCOConverter
# Convert dataset
converter = COCOConverter(
annotations_file="annotations.json",
images_dir="images/",
output_dir="yolo_dataset/"
)
stats = converter.convert()
print(f"Converted {stats['total_annotations']} annotations")
Command-Line Interface
# Train a model
aegis-train \
--model yolov11l \
--data /path/to/dataset \
--epochs 100 \
--batch 16 \
--wandb-project my-project
# Convert COCO to YOLO
aegis-train convert-coco \
--annotations annotations.json \
--images images/ \
--output yolo_dataset/
Features
🎯 YOLO Training
- Multi-version support: YOLOv8, v9, v10, v11
- Fine-tuning & from-scratch training modes
- Automatic augmentation configuration
- Early stopping with patience
- Validation metrics: mAP50, mAP50-95, precision, recall
📊 Experiment Tracking
- Wandb integration for metrics, charts, and artifacts
- Automatic logging of hyperparameters, metrics, and model outputs
- Run resumption support
🔄 Dataset Handling
- COCO format support
- Auto-conversion to YOLO format
- Label filtering and validation
- Dataset statistics reporting
📦 Model Export
- ONNX - Cross-platform inference
- CoreML - iOS/macOS deployment
- OpenVINO - Intel hardware optimization
- TensorRT - NVIDIA GPU optimization
- TFLite - Mobile/edge deployment
☁️ Cloud Environment Support
- Kaggle - Kernel execution and dataset management
- Google Colab - Ready-to-use notebooks
- Environment detection - Auto-configuration for different platforms
Configuration
Training Configuration
config = {
# Model settings
"model_variant": "yolov11l",
"training_mode": "fine_tune", # or "from_scratch"
# Training hyperparameters
"epochs": 100,
"batch_size": 16,
"img_size": 640,
"learning_rate": 0.01,
"momentum": 0.937,
"weight_decay": 0.0005,
# Augmentation
"augmentation": {
"hsv_h": 0.015,
"hsv_s": 0.7,
"hsv_v": 0.4,
"degrees": 0.0,
"translate": 0.1,
"scale": 0.5,
"shear": 0.0,
"perspective": 0.0,
"flipud": 0.0,
"fliplr": 0.5,
"mosaic": 1.0,
"mixup": 0.0,
},
# Early stopping
"early_stopping": {
"enabled": True,
"patience": 50,
"min_delta": 0.0001
},
# Wandb
"wandb_enabled": True,
"wandb_project": "my-project",
"wandb_entity": "my-team",
# Export
"output_formats": ["onnx", "coreml", "openvino"],
}
trainer = YOLOTrainer(**config)
Examples
Kaggle Kernel
# In a Kaggle kernel
from aegis_vision import YOLOTrainer
trainer = YOLOTrainer(
model_variant="yolov11l",
dataset_path="/kaggle/input/my-dataset",
epochs=100,
wandb_api_key="/kaggle/input/secrets/wandb_api_key.txt"
)
results = trainer.train()
trainer.save_to_kaggle_output()
Custom Dataset
from aegis_vision import YOLOTrainer, COCOConverter
# 1. Convert your COCO dataset
converter = COCOConverter(
annotations_file="my_annotations.json",
images_dir="my_images/",
output_dir="yolo_dataset/",
labels_filter=["person", "car", "dog"] # Optional filtering
)
converter.convert()
# 2. Train
trainer = YOLOTrainer(
model_variant="yolov11m",
dataset_path="yolo_dataset/",
epochs=50,
)
results = trainer.train()
API Reference
YOLOTrainer
Main class for training YOLO models.
Methods:
train()- Start trainingsetup_wandb()- Configure Wandb trackingexport()- Export trained modelvalidate()- Run validationget_metrics()- Retrieve training metrics
COCOConverter
Convert COCO format datasets to YOLO format.
Methods:
convert()- Perform conversionvalidate()- Check dataset integrityget_statistics()- Dataset statistics
Development
# Clone repository
git clone https://github.com/your-org/aegis-vision.git
cd aegis-vision
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
# Lint
ruff src/
Testing & Debugging
Programmatic Task Submission
Test the agent without using the UI:
# Submit a basic training task
python test_submit_task.py
# Submit with CoreML export
python test_submit_task.py --coreml --epochs 5
# Submit with custom configuration
python test_submit_task.py --model yolo11n --epochs 10 --batch-size 16
See TEST_TASK_SUBMISSION.md for complete documentation.
Debugging with VS Code/Cursor
-
Set up debugging:
# Debug configurations are pre-configured in .vscode/launch.json # Just open the project in VS Code/Cursor
-
Start debugging:
- Set breakpoints in
src/aegis_vision/agent.pyortrainer.py - Press F5 and select "Debug Aegis Agent"
- Submit a task (via UI or
test_submit_task.py) - Debugger will pause at your breakpoints
- Set breakpoints in
-
Common debugging scenarios:
- CoreML export issues: Breakpoint at
trainer.py:_export_coreml() - Task execution: Breakpoint at
agent.py:execute_task() - Training config: Breakpoint at
training_script.py:main()
- CoreML export issues: Breakpoint at
See DEBUG_GUIDE.md for comprehensive debugging documentation.
Combined Testing Workflow
The most powerful debugging approach:
# Terminal 1: Start agent in debug mode (VS Code/Cursor)
# Press F5 → "Debug Aegis Agent"
# Set breakpoints in agent.py or trainer.py
# Terminal 2: Submit test task
python test_submit_task.py --coreml
# Debugger will pause at breakpoints
# Inspect variables, step through code, fix issues
This enables rapid iteration without manual UI interaction.
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Roadmap
- Support for additional YOLO architectures
- Integration with Hugging Face Hub
- Distributed training support
- Auto-hyperparameter tuning
- Model quantization utilities
- Segmentation and pose estimation models
- Real-time inference utilities
Citation
@software{aegis_vision,
title = {Aegis Vision: Cloud-native Computer Vision Training Toolkit},
author = {Aegis AI Team},
year = {2025},
url = {https://github.com/your-org/aegis-vision}
}
Support
- 📧 Email: support@aegis-ai.com
- 💬 Discord: Join our community
- 📚 Documentation: https://aegis-vision.readthedocs.io
- 🐛 Issues: GitHub Issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aegis_vision-0.2.10.tar.gz.
File metadata
- Download URL: aegis_vision-0.2.10.tar.gz
- Upload date:
- Size: 92.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fedf187b8733c9c1994068fb1465596ba12a312ea5c9e3d99c018d05467e8752
|
|
| MD5 |
b8f4d402f0a428dfdebf923b7185e30f
|
|
| BLAKE2b-256 |
d0ba21e8772b202694245f26b16083b9d5707788cc5a3cc89d695e43b50b0cba
|
File details
Details for the file aegis_vision-0.2.10-py3-none-any.whl.
File metadata
- Download URL: aegis_vision-0.2.10-py3-none-any.whl
- Upload date:
- Size: 100.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4879867d4893a88bd6c99c12f00792cb2a0cf8dd8df813af3fb0536f29a4604
|
|
| MD5 |
46101cd24ef5c8c71e3c2d733166650e
|
|
| BLAKE2b-256 |
6420e829fa8a4ef470eb428d6cfb342f80de5edab6a0831ec10b58f268f60bec
|