OCR-D wrapper for detectron2 based segmentation models
Project description
ocrd_detectron2
OCR-D wrapper for detectron2 based segmentation models
Introduction
This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.
In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).
Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.
Installation
Create and activate a virtual environment as usual.
To install Python dependencies:
make deps
Which is the equivalent of:
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only
To install this module, then do:
make install
Which is the equivalent of:
pip install .
Usage
OCR-D processor interface ocrd-detectron2-segment
To be used with PAGE-XML documents in an OCR-D annotation workflow.
Usage: ocrd-detectron2-segment [OPTIONS]
Detect regions with Detectron2
> Use detectron2 to segment each page into regions.
> Open and deserialize PAGE input files and their respective images.
> Fetch a raw and a binarized image for the page frame (possibly
> cropped and deskewed).
> Feed the raw image into the detectron2 predictor that has been used
> to load the given model. Then, depending on the model capabilities
> (whether it can do panoptic segmentation or only instance
> segmentation, whether the latter can do masks or only bounding
> boxes), post-process the predictions:
> - panoptic segmentation: take the provided segment label map, and
> apply the segment to class label map
> - instance segmentation: find an optimal non-overlapping set (flat
> map) of instances via non-maximum suppression; then extend / shrink
> the surviving masks to fully include / exclude connected components
> in the foreground that are on the boundary
> Finally, find the convex hull polygon for each region, and map its
> class id to a new PAGE region type (and subtype).
> Produce a new output file by serialising the resulting hierarchy.
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-C, --show-resource RESNAME Dump the content of processor resource RESNAME
-L, --list-resources List names of processor resources
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show version
Parameters:
"categories" [array - REQUIRED]
maps each region category (position) of the model to a PAGE region
type (and subtype if separated by colon), e.g.
['TextRegion:paragraph', 'TextRegion:heading',
'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet
"min_confidence" [number - 0.5]
confidence threshold for detections
"model_config" [string - REQUIRED]
path name of model config
"model_weights" [string - REQUIRED]
path name of model weights
"device" [string - "cuda"]
select computing device for Torch (e.g. cpu or cuda:0); will fall
back to CPU if no GPU is available
Example:
ocrd resmgr download -n ocrd-detectron2-segment https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/All_X152.yaml
ocrd resmgr download -n ocrd-detectron2-segment https://layoutlm.blob.core.windows.net/tablebank/model_zoo/detection/All_X152/model_final.pth
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config All_X152.yaml -P model_weights model_final.pth -P min_confidence 0.1
Models
Note: These are just examples, no exhaustive search was done yet!
Note: Make sure you unpack first if the download link is an archive. Also, the filename suffix (.pth vs .pkl) of the weight file does matter!
TableBank
R152-FPN config|weights|["TableRegion"]
PubLayNet
R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
X101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
PubLayNet
R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]
LayoutParser
provides different model variants of various depths for multiple datasets:
- PubLayNet (Medical Research Papers)
- TableBank (Tables Computer Typesetting)
- PRImALayout (Various Computer Typesetting)
- HJDataset (Historical Japanese Magazines)
- NewspaperNavigator (Historical Newspapers)
- Math Formula Detection
See here for an overview. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.
DocBank
X101-FPN archive
Proposed mappings:
["TextRegion:heading", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"]
(using only predefined@type
)["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"]
(using@custom
as well)
Testing
none yet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ocrd_detectron2-0.1.1.tar.gz
.
File metadata
- Download URL: ocrd_detectron2-0.1.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f1b858d46792fb361716dca99c178b27fb5c273417c5801800e9524883e7401 |
|
MD5 | 34f54def020bca3e9de4394a4d702741 |
|
BLAKE2b-256 | de99806d349fcc13ba972185643073135602249dd69f89d69aaefa352a1f0a04 |
File details
Details for the file ocrd_detectron2-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: ocrd_detectron2-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e85ce3f8ad2c9df5590e855509b6567e79031614fed9c25b05de9060be148569 |
|
MD5 | d0ebafde9f4aa342a37080754190cbf0 |
|
BLAKE2b-256 | a907d7cf3c6533cd5f357b324aa7c14bbd74b49d04bc2391bea8cd7301237112 |