Skip to main content

pixel-classifier based page segmentation

Project description

page-segmentation module for OCR-d


This module implements a page segmentation algorithm based on a Fully Convolutional Network (FCN). The FCN creates a classification for each pixel in a binary image. This result is then segmented per class using XY cuts.


  • For GPU-Support: CUDA and CUDNN
  • other requirements are installed via Makefile / pip, see requirements.txt in repository root.


If you want to use GPU support, set the environment variable TENSORFLOW_GPU to a nonempty value, otherwise leave it unset. Then:

make deps

to install dependencies and

make install

to install the package.

Both are python packages installed via pip, so you may want to activate a virtalenv before installing.


ocrd-pc-segmentation follows the ocrd CLI.

It expects a binary page image and produces region entries in the PageXML file.


The following parameters are recognized in the JSON parameter file:

  • overwrite_regions: remove previously existing text regions
  • xheight: height of character "x" in pixels used during training.
  • model: pixel-classifier model path. The special values __DEFAULT__ and __LEGACY__ load the bundled default model or previous default model respectively.
  • gpu_allow_growth: required for GPU use with some graphic cards (set to true, if you get CUDNN_INTERNAL_ERROR)
  • resize_height: scale down pixelclassifier output to this height before postprocessing. Independent of training / used model. (performance / quality tradeoff, defaults to 300)


There is a simple CLI test, that will run the tool on a single image from the assets repository.

make test-cli


To train models for the pixel classifier, see its README

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_pc_segmentation-0.2.3.tar.gz (15.0 MB view hashes)

Uploaded Source

Built Distribution

ocrd_pc_segmentation-0.2.3-py3-none-any.whl (15.0 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page