End-to-end document widget detection pipeline using YOLO11 on CommonForms dataset

Project description

Widget Detection Pipeline

End-to-end document form widget detection using YOLO11m trained on the CommonForms dataset.

Detects 3 classes of form fields from scanned PDFs and document images:

Class ID	Name	Description
0	`text_input`	Text boxes, input lines
1	`choice_button`	Checkboxes + radio buttons
2	`signature`	Signature fields

Requirements

Python 3.11+
uv (pip install uv)
CUDA GPU with ≥ 12 GB VRAM (RTX 3080Ti / A2000 12GB / etc.) — training at 1024px with batch=4

Setup

# 1. Install uv if not already installed
pip install uv

# 2. Create venv and install all dependencies
uv sync

# 3. (Optional) Install dev dependencies for testing/linting
uv sync --extra dev

Pipeline

Step 1 — Download Dataset (CommonForms subset)

Streams 50,000 images from HuggingFace (no full 163GB download needed):

uv run scripts/download_dataset.py --max-images 50000

Options:

--max-images N — number of images (default: 50,000)
--token HF_TOKEN — HuggingFace token if needed
--seed 42 — reproducibility seed

Output: data/raw/images/ + data/raw/annotations/

Step 2 — Convert to YOLO Format

uv run scripts/convert_to_yolo.py

Options:

--val-ratio 0.1 — validation split (default: 10%)
--seed 42

Output: data/yolo/ with images/, labels/, data.yaml

Step 3 — Verify Dataset

# Check integrity
uv run scripts/verify_dataset.py

# Visual inspection (draws 20 sample images with bboxes)
uv run scripts/verify_dataset.py --draw-samples 20

Step 4 — Train

# Full training (100 epochs, batch=4, 1024px)
uv run train.py --config configs/train_config.yaml

# Smoke test (3 epochs, quick sanity check)
uv run train.py --config configs/train_config.yaml --smoke-test

# Resume from last checkpoint
uv run train.py --config configs/train_config.yaml --resume

Training output: runs/detect/widget_yolo11m/

Step 5 — Run Inference

# PDF input → JSON output
uv run inference.py \
    --input form.pdf \
    --model runs/detect/widget_yolo11m/weights/best.pt

# Image input with lower confidence threshold
uv run inference.py \
    --input scan.jpg \
    --model best.pt \
    --conf 0.2

# Batch of PDFs with visual overlay
uv run inference.py \
    --input "forms/*.pdf" \
    --model best.pt \
    --visualize \
    --output-dir outputs/

# High DPI for dense forms
uv run inference.py --input form.pdf --model best.pt --dpi 300

Output Format

{
  "source": "form.pdf",
  "total_pages": 3,
  "total_widgets": 24,
  "pages": [
    {
      "source": "form.pdf",
      "page": 1,
      "image_width": 1654,
      "image_height": 2339,
      "processing_time_ms": 142.3,
      "widgets": [
        {
          "class_id": 0,
          "class_name": "text_input",
          "confidence": 0.913,
          "bbox": {
            "x1": 120.0, "y1": 340.0, "x2": 480.0, "y2": 380.0,
            "x1_norm": 0.073, "y1_norm": 0.145,
            "x2_norm": 0.290, "y2_norm": 0.163
          },
          "page": 1
        }
      ]
    }
  ]
}

Run Tests

uv run pytest tests/ -v

Training Config Highlights (12 GB GPU)

Parameter	Value	Reason
`imgsz`	1024	Small widget detection needs high resolution
`batch`	4	Safe for 12 GB VRAM at 1024px
`amp`	true	Mixed precision — reduces VRAM ~40%
`epochs`	100	With early stopping (patience=20)
`degrees`	10.0	Rotation for skewed scans
`perspective`	0.0005	Real-world document distortion
`mosaic`	1.0	Key augmentation for small widgets
`albumentations`	auto	Blur + noise when installed

Project Structure

Widget_detection1/
├── widget_detector/          # Core library
│   ├── config.py             # Paths, class maps, defaults
│   ├── dataset.py            # HF download + YOLO conversion
│   ├── detector.py           # WidgetDetector inference class
│   ├── output.py             # Pydantic result models
│   ├── pdf_utils.py          # PDF → PIL images (PyMuPDF)
│   └── trainer.py            # Training wrapper
├── scripts/
│   ├── download_dataset.py   # Step 1: Download
│   ├── convert_to_yolo.py    # Step 2: Convert
│   └── verify_dataset.py     # Step 3: Verify
├── configs/
│   └── train_config.yaml     # YOLO11m hyperparameters
├── train.py                  # Training entry point
├── inference.py              # Inference entry point
├── tests/                    # Unit tests
└── pyproject.toml            # uv project manifest

Notes

CommonForms choice_button includes both checkboxes and radio buttons as one class (the dataset does not distinguish them). If you need to split them, a heuristic post-processor can be added based on bbox aspect ratio.
Training is set to 3 classes (text_input, choice_button, signature) matching CommonForms exactly.
The data/ and runs/ directories are gitignored — do not commit them.

Project details

Release history Release notifications | RSS feed

0.1.2

May 18, 2026

0.1.1

May 11, 2026

This version

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psynx_widget_detector-0.1.0.tar.gz (13.8 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

psynx_widget_detector-0.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file psynx_widget_detector-0.1.0.tar.gz.

File metadata

Download URL: psynx_widget_detector-0.1.0.tar.gz
Upload date: May 11, 2026
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for psynx_widget_detector-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ccdad364c38222152afc3b44b4ff5b538bc602637a14752fd2fc41f27cbb4b14`
MD5	`fc78baec6175a3a28d8c28ec2f2ec5d2`
BLAKE2b-256	`68301cf2f004db0972af6292af93c006f722c25fce58890668dc26c82048f9d5`

See more details on using hashes here.

File details

Details for the file psynx_widget_detector-0.1.0-py3-none-any.whl.

File metadata

Download URL: psynx_widget_detector-0.1.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 17.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for psynx_widget_detector-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c16760a0d6464e65361f517cb7d2378efb8b6974a1cba37e11f5def657c939d`
MD5	`099ca0d953ee4c7e32710eebdbed50b8`
BLAKE2b-256	`099b835315b8224cad3b0f6342ef2c2b0a307d54fb1e5e32f6c22e9805b0ad3b`

See more details on using hashes here.

psynx-widget-detector 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Widget Detection Pipeline

Requirements

Setup

Pipeline

Step 1 — Download Dataset (CommonForms subset)

Step 2 — Convert to YOLO Format

Step 3 — Verify Dataset

Step 4 — Train

Step 5 — Run Inference

Output Format

Run Tests

Training Config Highlights (12 GB GPU)

Project Structure

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes