Production-ready diagram detection for academic papers using YOLO11
Project description
diagram-detector
Production-ready diagram detection for academic papers using YOLO11.
Features
- 🚀 Fast: 50-500 images/second depending on hardware
- 🌐 Remote GPU: Process on remote server via SSH
- 📄 PDF Support: Automatically processes PDF pages
- 🎯 Accurate: F1 score 0.74-0.82 depending on model
- 🔧 Flexible: CPU, CUDA, and MPS (Apple Silicon) support
- 📊 Multiple Formats: JSON, CSV, cropped images, visualizations
- 🔄 Batch Optimized: Auto-detects optimal batch size
- 💾 Smart Caching: Downloads models once, uses forever
Installation
pip install diagram-detector
From source:
git clone https://github.com/yourusername/diagram-detector.git
cd diagram-detector
pip install -e .
Quick Start
Python API
from diagram_detector import DiagramDetector
# Initialize detector
detector = DiagramDetector(model='yolo11m') # or 'yolo11n', 'yolo11l', etc.
# Detect diagrams in images
results = detector.detect('path/to/images/')
# Or from PDFs
results = detector.detect_pdf('paper.pdf')
# Access results
for result in results:
print(f"{result.filename}: {result.has_diagram} ({result.confidence:.2f})")
# Save results
detector.save_results(results, 'output/', format='json') # or 'csv'
# Extract diagram crops
detector.save_crops(results, 'diagrams/')
Command Line
# Detect in images
diagram-detect --input images/ --output results/
# Process PDF
diagram-detect --input paper.pdf --output results/ --save-crops
# With visualization
diagram-detect --input paper.pdf --visualize --confidence 0.35
# Batch processing
diagram-detect --input papers/*.pdf --output results/ --batch-size 16
# Remote GPU inference (process 300K images on remote server!)
diagram-detect --input images/ --remote user@gpu-server:22 --output results/
Models
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| yolo11n | 6 MB | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | Mobile, testing |
| yolo11s | 22 MB | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | Edge devices |
| yolo11m | 49 MB | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | Production (recommended) |
| yolo11l | 63 MB | ⚡⚡ | ⭐⭐⭐⭐⭐⭐ | High accuracy |
| yolo11x | 137 MB | ⚡ | ⭐⭐⭐⭐⭐⭐⭐ | Research |
Models are automatically downloaded on first use.
Advanced Usage
Custom Configuration
detector = DiagramDetector(
model='yolo11m',
confidence=0.35,
device='cuda', # or 'cpu', 'mps'
batch_size=32, # or 'auto'
)
# Detect with options
results = detector.detect(
'images/',
save_crops=True,
save_visualizations=True,
crop_padding=10, # pixels
)
Batch Processing
from pathlib import Path
# Process multiple PDFs
pdf_files = Path('papers/').glob('*.pdf')
for pdf in pdf_files:
results = detector.detect_pdf(pdf)
detector.save_results(results, f'results/{pdf.stem}/')
Integration with Pipelines
# Use in document processing pipeline
def process_paper(pdf_path):
detector = DiagramDetector()
# Detect diagrams
results = detector.detect_pdf(pdf_path)
# Filter high-confidence detections
diagrams = [r for r in results if r.confidence > 0.5]
# Extract for further analysis
for diagram in diagrams:
crop = diagram.get_crop()
# ... process crop
Performance
Speed Benchmarks
| Hardware | Model | Batch Size | Speed |
|---|---|---|---|
| CPU (Intel i7) | yolo11n | 8 | ~50 img/s |
| CPU (Apple M1) | yolo11n | 8 | ~80 img/s |
| GPU (RTX 3090) | yolo11m | 32 | ~500 img/s |
| GPU (RTX 4090) | yolo11m | 64 | ~800 img/s |
Memory Usage
| Model | CPU | GPU (batch=16) | GPU (batch=32) |
|---|---|---|---|
| yolo11n | 200 MB | 1 GB | 1.5 GB |
| yolo11m | 400 MB | 2 GB | 3 GB |
| yolo11l | 500 MB | 3 GB | 4.5 GB |
Output Formats
JSON
{
"filename": "figure1.jpg",
"has_diagram": true,
"count": 2,
"confidence": 0.89,
"detections": [
{
"bbox": [100, 150, 400, 500],
"confidence": 0.89,
"class": "diagram"
}
]
}
CSV
filename,has_diagram,count,max_confidence
figure1.jpg,true,2,0.89
figure2.jpg,false,0,0.00
Docker Support
# Build image
docker build -t diagram-detector .
# Run inference
docker run -v $(pwd)/data:/data diagram-detector \
diagram-detect --input /data/images/ --output /data/results/
Development
# Clone repository
git clone https://github.com/yourusername/diagram-detector.git
cd diagram-detector
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
black .
flake8 .
mypy .
Citation
If you use this detector in your research, please cite:
@software{diagram_detector,
title = {diagram-detector: Production-ready diagram detection for academic papers},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/diagram-detector}
}
License
MIT License - see LICENSE file for details.
Acknowledgments
- Built on YOLO11 by Ultralytics
- Trained on curated dataset of academic diagrams
- Part of the dh4pmp Digital Humanities project
Support
Changelog
See CHANGELOG.md for version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diagram_detector-1.1.0.tar.gz.
File metadata
- Download URL: diagram_detector-1.1.0.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9ad9336e0bdb6a49752f8d97db1ba7df51eaec66decf9e732a0f8be9ed7708c
|
|
| MD5 |
b1b740f92000559f9e3af9f94bf2dc66
|
|
| BLAKE2b-256 |
13d7a8bf781add064538d4cf0f00d92019fc136812ad010088a36715785c8b68
|
File details
Details for the file diagram_detector-1.1.0-py3-none-any.whl.
File metadata
- Download URL: diagram_detector-1.1.0-py3-none-any.whl
- Upload date:
- Size: 29.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60a2714c6339788feb0a1952114b08e2de8814ecd971b59c240d9e2abc54f31f
|
|
| MD5 |
ed74a61cf1ed8c7d940cd70baec63f04
|
|
| BLAKE2b-256 |
b6ddc322ef40ebf7a2b4976068e2167d9fdbf928254a12a95202ddf205787a00
|