Skip to main content

pixel-classifier based page segmentation

Project description

page-segmentation module for OCR-d

Introduction

This module implements a page segmentation algorithm based on a Fully Convolutional Network (FCN). The FCN creates a classification for each pixel in a binary image. This result is then segmented per class using XY cuts.

Requirements

  • For GPU-Support: CUDA and CUDNN
  • other requirements are installed via Makefile / pip, see requirements.txt in repository root.

Installation

If you want to use GPU support, set the environment variable TENSORFLOW_GPU to a nonempty value, otherwise leave it unset. Then:

make deps

to install dependencies and

make install

to install the package.

Both are python packages installed via pip, so you may want to activate a virtalenv before installing.

Usage

ocrd-pc-segmentation follows the ocrd CLI.

It expects a binary page image and produces region entries in the PageXML file.

Configuration

The following parameters are recognized in the JSON parameter file:

  • overwrite_regions: remove previously existing text regions
  • xheight: height of character "x" in pixels used during training.
  • model: pixel-classifier model path. The special values __DEFAULT__ and __LEGACY__ load the bundled default model or previous default model respectively.
  • gpu_allow_growth: required for GPU use with some graphic cards (set to true, if you get CUDNN_INTERNAL_ERROR)
  • resize_height: scale down pixelclassifier output to this height before postprocessing. Independent of training / used model. (performance / quality tradeoff, defaults to 300)

Testing

There is a simple CLI test, that will run the tool on a single image from the assets repository.

make test-cli

Training

To train models for the pixel classifier, see its README

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ocrd-pc-segmentation, version 0.2.3
Filename, size File type Python version Upload date Hashes
Filename, size ocrd_pc_segmentation-0.2.3-py3-none-any.whl (15.0 MB) File type Wheel Python version py3 Upload date Hashes View
Filename, size ocrd_pc_segmentation-0.2.3.tar.gz (15.0 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page