Skip to main content

Production-ready diagram detection for academic papers using YOLO11

Project description

diagram-detector

Production-ready diagram detection for academic papers using YOLO11.

PyPI version Python 3.9+ License: MIT

Features

  • 🚀 Fast: 50-500 images/second depending on hardware
  • 🌐 Remote GPU: Process on remote server via SSH
  • 📄 PDF Support: Automatically processes PDF pages
  • 🎯 Accurate: F1 score 0.74-0.82 depending on model
  • 🔧 Flexible: CPU, CUDA, and MPS (Apple Silicon) support
  • 📊 Multiple Formats: JSON, CSV, cropped images, visualizations
  • 🔄 Batch Optimized: Auto-detects optimal batch size
  • 💾 Smart Caching: Downloads models once, uses forever

Installation

pip install diagram-detector

From source:

git clone https://github.com/yourusername/diagram-detector.git
cd diagram-detector
pip install -e .

Quick Start

Python API

from diagram_detector import DiagramDetector

# Initialize detector
detector = DiagramDetector(model='yolo11m')  # or 'yolo11n', 'yolo11l', etc.

# Detect diagrams in images
results = detector.detect('path/to/images/')

# Or from PDFs
results = detector.detect_pdf('paper.pdf')

# Access results
for result in results:
    print(f"{result.filename}: {result.has_diagram} ({result.confidence:.2f})")
    
# Save results
detector.save_results(results, 'output/', format='json')  # or 'csv'

# Extract diagram crops
detector.save_crops(results, 'diagrams/')

Command Line

# Detect in images
diagram-detect --input images/ --output results/

# Process PDF
diagram-detect --input paper.pdf --output results/ --save-crops

# With visualization
diagram-detect --input paper.pdf --visualize --confidence 0.35

# Batch processing
diagram-detect --input papers/*.pdf --output results/ --batch-size 16

# Remote GPU inference (process 300K images on remote server!)
diagram-detect --input images/ --remote user@gpu-server:22 --output results/

Models

Model Size Speed Accuracy Use Case
yolo11n 6 MB ⚡⚡⚡⚡⚡ ⭐⭐⭐ Mobile, testing
yolo11s 22 MB ⚡⚡⚡⚡ ⭐⭐⭐⭐ Edge devices
yolo11m 49 MB ⚡⚡⚡ ⭐⭐⭐⭐⭐ Production (recommended)
yolo11l 63 MB ⚡⚡ ⭐⭐⭐⭐⭐⭐ High accuracy
yolo11x 137 MB ⭐⭐⭐⭐⭐⭐⭐ Research

Models are automatically downloaded on first use.

Advanced Usage

Custom Configuration

detector = DiagramDetector(
    model='yolo11m',
    confidence=0.35,
    device='cuda',  # or 'cpu', 'mps'
    batch_size=32,  # or 'auto'
)

# Detect with options
results = detector.detect(
    'images/',
    save_crops=True,
    save_visualizations=True,
    crop_padding=10,  # pixels
)

Batch Processing

from pathlib import Path

# Process multiple PDFs
pdf_files = Path('papers/').glob('*.pdf')

for pdf in pdf_files:
    results = detector.detect_pdf(pdf)
    detector.save_results(results, f'results/{pdf.stem}/')

Integration with Pipelines

# Use in document processing pipeline
def process_paper(pdf_path):
    detector = DiagramDetector()
    
    # Detect diagrams
    results = detector.detect_pdf(pdf_path)
    
    # Filter high-confidence detections
    diagrams = [r for r in results if r.confidence > 0.5]
    
    # Extract for further analysis
    for diagram in diagrams:
        crop = diagram.get_crop()
        # ... process crop

Performance

Speed Benchmarks

Hardware Model Batch Size Speed
CPU (Intel i7) yolo11n 8 ~50 img/s
CPU (Apple M1) yolo11n 8 ~80 img/s
GPU (RTX 3090) yolo11m 32 ~500 img/s
GPU (RTX 4090) yolo11m 64 ~800 img/s

Memory Usage

Model CPU GPU (batch=16) GPU (batch=32)
yolo11n 200 MB 1 GB 1.5 GB
yolo11m 400 MB 2 GB 3 GB
yolo11l 500 MB 3 GB 4.5 GB

Output Formats

JSON

{
  "filename": "figure1.jpg",
  "has_diagram": true,
  "count": 2,
  "confidence": 0.89,
  "detections": [
    {
      "bbox": [100, 150, 400, 500],
      "confidence": 0.89,
      "class": "diagram"
    }
  ]
}

CSV

filename,has_diagram,count,max_confidence
figure1.jpg,true,2,0.89
figure2.jpg,false,0,0.00

Docker Support

# Build image
docker build -t diagram-detector .

# Run inference
docker run -v $(pwd)/data:/data diagram-detector \
    diagram-detect --input /data/images/ --output /data/results/

Development

# Clone repository
git clone https://github.com/yourusername/diagram-detector.git
cd diagram-detector

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
black .
flake8 .
mypy .

Citation

If you use this detector in your research, please cite:

@software{diagram_detector,
  title = {diagram-detector: Production-ready diagram detection for academic papers},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/diagram-detector}
}

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on YOLO11 by Ultralytics
  • Trained on curated dataset of academic diagrams
  • Part of the dh4pmp Digital Humanities project

Support

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diagram_detector-1.1.0.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diagram_detector-1.1.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file diagram_detector-1.1.0.tar.gz.

File metadata

  • Download URL: diagram_detector-1.1.0.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for diagram_detector-1.1.0.tar.gz
Algorithm Hash digest
SHA256 a9ad9336e0bdb6a49752f8d97db1ba7df51eaec66decf9e732a0f8be9ed7708c
MD5 b1b740f92000559f9e3af9f94bf2dc66
BLAKE2b-256 13d7a8bf781add064538d4cf0f00d92019fc136812ad010088a36715785c8b68

See more details on using hashes here.

File details

Details for the file diagram_detector-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for diagram_detector-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60a2714c6339788feb0a1952114b08e2de8814ecd971b59c240d9e2abc54f31f
MD5 ed74a61cf1ed8c7d940cd70baec63f04
BLAKE2b-256 b6ddc322ef40ebf7a2b4976068e2167d9fdbf928254a12a95202ddf205787a00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page