OCR-D wrapper for detectron2 based segmentation models

Project description

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

Introduction
Installation
Usage
- OCR-D processor interface ocrd-detectron2-segment
Models
Testing

Introduction

This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.

In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).

Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Usage

OCR-D processor interface `ocrd-detectron2-segment`

To be used with PAGE-XML documents in an OCR-D annotation workflow.

Usage: ocrd-detectron2-segment [OPTIONS]

  Detect regions with Detectron2 models

  > Use detectron2 to segment each page into regions.

  > Open and deserialize PAGE input files and their respective images.
  > Fetch a raw and a binarized image for the page frame (possibly
  > cropped and deskewed).

  > Feed the raw image into the detectron2 predictor that has been used
  > to load the given model. Then, depending on the model capabilities
  > (whether it can do panoptic segmentation or only instance
  > segmentation, whether the latter can do masks or only bounding
  > boxes), post-process the predictions:

  > - panoptic segmentation: take the provided segment label map, and
  >   apply the segment to class label map,
  > - instance segmentation: find an optimal non-overlapping set (flat
  >   map) of instances via non-maximum suppression,
  > - both: avoid overlapping pre-existing top-level regions (incremental
  >   segmentation).

  > Then extend / shrink the surviving masks to fully include / exclude
  > connected components in the foreground that are on the boundary.

  > Finally, find the convex hull polygon for each region, and map its
  > class id to a new PAGE region type (and subtype).

  > (Does not annotate `ReadingOrder` or `TextLine`s or `@orientation`.)

  > Produce a new output file by serialising the resulting hierarchy.

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  --profile                       Enable profiling
  --profile-file                  Write cProfile stats to this file. Implies --profile
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -D, --dump-module-dir           Output the 'module' directory with resources for this processor
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "categories" [array - REQUIRED]
    maps each region category (position) of the model to a PAGE region
    type (and @type or @custom if separated by colon), e.g.
    ['TextRegion:paragraph', 'TextRegion:heading',
    'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet;
    categories with an empty string will be skipped during prediction
   "min_confidence" [number - 0.5]
    confidence threshold for detections
   "model_config" [string - REQUIRED]
    path name of model config
   "model_weights" [string - REQUIRED]
    path name of model weights
   "device" [string - "cuda"]
    select computing device for Torch (e.g. cpu or cuda:0); will fall
    back to CPU if no GPU is available

Example:

ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1 # equivalent, with presets file
ocrd resmgr download ocrd-detectron2-segment "*" # get all preconfigured models

Models

Some of the following models have already been registered as known file resources, along with parameter presets to use them.

To get a list of available registered models, do:

ocrd resmgr list-available -e ocrd-detectron2-segment

To get a list of already installed models and presets, do:

ocrd resmgr list-installed -e ocrd-detectron2-segment

To download a registered model (i.e. a config file and the respective weights file), do:

ocrd resmgr download ocrd-detectron2-segment NAME.yaml
ocrd resmgr download ocrd-detectron2-segment NAME.pth

To download more models (registered or other), see:

ocrd resmgr download --help

To use a model, do:

ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...
ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file

Note: These are just examples, no exhaustive search was done yet!

Note: Make sure you unpack first if the download link is an archive. Also, the filename suffix (.pth vs .pkl) of the weight file does matter!

TableBank

R152-FPN config|weights|["TableRegion"]

PubLayNet

R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

X101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

PubLayNet

R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

LayoutParser

provides different model variants of various depths for multiple datasets:

PubLayNet (Medical Research Papers)
TableBank (Tables Computer Typesetting)
PRImALayout (Various Computer Typesetting)
HJDataset (Historical Japanese Magazines)
NewspaperNavigator (Historical Newspapers)
Math Formula Detection

See here for an overview. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.

DocBank

X101-FPN archive

Proposed mappings:

["TextRegion:heading", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"] (using only predefined @type)
["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"] (using @custom as well)

Testing

none yet

Project details

Release history Release notifications | RSS feed

0.1.8

Jun 29, 2023

0.1.7

Mar 20, 2023

0.1.6

Mar 10, 2023

0.1.5

Mar 8, 2023

0.1.4

Dec 3, 2022

0.1.3

Nov 2, 2022

This version

0.1.2

Oct 28, 2022

0.1.1

Feb 2, 2022

0.1.0

Jan 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_detectron2-0.1.2.tar.gz (20.7 kB view details)

Uploaded Oct 28, 2022 Source

Built Distribution

ocrd_detectron2-0.1.2-py2.py3-none-any.whl (20.9 kB view details)

Uploaded Oct 28, 2022 Python 2 Python 3

File details

Details for the file ocrd_detectron2-0.1.2.tar.gz.

File metadata

Download URL: ocrd_detectron2-0.1.2.tar.gz
Upload date: Oct 28, 2022
Size: 20.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.15

File hashes

Hashes for ocrd_detectron2-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`a940c7f36ac761d5512427eced137d986a02a9c87133318f6333b86eaa475497`
MD5	`aefc7cb95f614e2f343e125f7d52d9f2`
BLAKE2b-256	`fa8839df7fc2f09d2ad97d70f88e5930925d825e28fba838969b9a87f4271117`

See more details on using hashes here.

File details

Details for the file ocrd_detectron2-0.1.2-py2.py3-none-any.whl.

File metadata

Download URL: ocrd_detectron2-0.1.2-py2.py3-none-any.whl
Upload date: Oct 28, 2022
Size: 20.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.15

File hashes

Hashes for ocrd_detectron2-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`761d57d7c3931ec855dcd7c799b6c1b3eef7bff53b0a1cfb5925b395406de723`
MD5	`ccc3799b03865ecc53be1553ce1f1ac5`
BLAKE2b-256	`3d8002266f07d3f5604f04e780651366d33a28f1e74f949053a7ace1eaefd114`

See more details on using hashes here.

ocrd-detectron2 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ocrd_detectron2

Introduction

Installation

Usage

OCR-D processor interface `ocrd-detectron2-segment`

Models

TableBank

PubLayNet

PubLayNet

LayoutParser

DocBank

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

ocrd-detectron2 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ocrd_detectron2

Introduction

Installation

Usage

OCR-D processor interface ocrd-detectron2-segment

Models

TableBank

PubLayNet

PubLayNet

LayoutParser

DocBank

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

OCR-D processor interface `ocrd-detectron2-segment`