Skip to main content

DocLayout-YOLO: an effecient and robust document layout analysis method.

Project description

DocLayout-YOLO: Advancing Document Layout Analysis with Mesh-candidate Bestfit and Global-to-local perception

Official PyTorch implementation of DocLayout-YOLO.

Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He

Abstract We introduce DocLayout-YOLO, which not only enhances accuracy but also preserves the speed advantage through optimization from pre-training and model perspectives in a document-tailored manner. In terms of robust document pretraining, we innovatively regard document synthetic as a 2D bin packing problem and introduce Mesh-candidate Bestfit, which enables the generation of large-scale, diverse document datasets. The model, pre-trained on the resulting DocSynth300K dataset, significantly enhances fine-tuning performance across a variety of document types. In terms of model enhancement for document understanding, we propose a Global-to-local Controllable Receptive Module which emulates the human visual process from global to local perspectives and features a controllable module for feature extraction and integration. Experimental results on extensive downstream datasets show that the proposed DocLayout-YOLO excels at both speed and accuracy.


Quick Start

1. Environment Setup

To set up your environment, follow these steps:

conda create -n doclayout_yolo python=3.10
conda activate doclayout_yolo
pip install -e .

Note: If you only need the package for inference, you can simply install it via pip:

pip install doclayout-yolo

2. Prediction

You can perform predictions using either a script or the SDK:

  • Script

    Run the following command to make a prediction using the script:

    python demo.py --model path/to/model --image-path path/to/image
    
  • SDK

    Here is an example of how to use the SDK for prediction:

    import cv2
    from doclayout_yolo import YOLOv10
    
    # Load the pre-trained model
    model = YOLOv10("path/to/provided/model")
    
    # Perform prediction
    det_res = model.predict(
        "path/to/image",   # Image to predict
        imgsz=1024,        # Prediction image size
        conf=0.2,          # Confidence threshold
        device="cuda:0"    # Device to use (e.g., 'cuda:0' or 'cpu')
    )
    
    # Annotate and save the result
    annotated_frame = det_res[0].plot(pil=True, line_width=5, font_size=20)
    cv2.imwrite("result.jpg", annotated_frame)
    

We provide model fine-tuned on DocStructBench for prediction, which is capable of handing various document types. Model can be downloaded from here and example images can be found under assets/example.


You also can use predict_single.py for prediction with custom inference settings. For batch process, please refer to PDF-Extract-Kit.

Training and Evaluation on Public DLA Datasets

Data Preparation

  1. specify data root path

Find your ultralytics config file (for Linux user in $HOME/.config/Ultralytics/settings.yaml) and change datasets_dir to project root path.

  1. Download prepared yolo-format D4LA and doclaynet data from below and put to ./layout_data:
Dataset Download
D4LA link
DocLayNet link

the file structure is as follows:

./layout_data
├── D4LA
│   ├── images
│   ├── labels
│   ├── test.txt
│   └── train.txt
└── doclaynet
    ├── images
    ├── labels
    ├── val.txt
    └── train.txt

Training and Evaluation

Training is conducted on 8 GPUs with a global batch size of 64 (8 images per device), detailed settings and checkpoints are as follows:

Dataset Model DocSynth300K Pretrained? imgsz Learning rate Finetune Evaluation AP50 mAP Checkpoint
D4LA DocLayout-YOLO 1600 0.04 command command 81.7 69.8 checkpoint
D4LA DocLayout-YOLO 1600 0.04 command command 82.4 70.3 checkpoint
DocLayNet DocLayout-YOLO 1120 0.02 command command 93.0 77.7 checkpoint
DocLayNet DocLayout-YOLO 1120 0.02 command command 93.4 79.7 checkpoint

The DocSynth300K pretrained model can be downloaded from here. Change checkpoint.pt to the path of model to be evaluated during evaluation.

Acknowledgement

The code base is built with ultralytics and YOLO-v10.

Thanks for these great work!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doclayout_yolo-0.0.2.tar.gz (598.2 kB view details)

Uploaded Source

Built Distribution

doclayout_yolo-0.0.2-py3-none-any.whl (708.2 kB view details)

Uploaded Python 3

File details

Details for the file doclayout_yolo-0.0.2.tar.gz.

File metadata

  • Download URL: doclayout_yolo-0.0.2.tar.gz
  • Upload date:
  • Size: 598.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for doclayout_yolo-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c7ab629a8fd45eff6fb6c361cce65362b8326203fae891e17a95d731ce5c94c1
MD5 a8d0313f9e665f5fe333df10a95d4055
BLAKE2b-256 19e8319217d955dfdca67846d24865a8114b23b98301832adfd540a2bb8f9150

See more details on using hashes here.

File details

Details for the file doclayout_yolo-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for doclayout_yolo-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9155d74be92c3a2441ac3dcd7263760637045480b8a4b71bde807976f9e47671
MD5 dd1be1e9b33c33d279b6720815db730e
BLAKE2b-256 32f7b6255e19d49a216af0d98d125eeec66e91821a20f2fe3d02456abb248309

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page