Skip to main content

Pixelwise binarization with selectional auto-encoders in Keras

Project description

Binarization

Binarization for document images

Examples

Introduction

This tool performs document image binarization using a trained ResNet50-UNet model.

Installation

Clone the repository, enter it and run

pip install .

Models

Pre-trained models in HDF5 format can be downloaded from here:

https://qurator-data.de/sbb_binarization/

We also provide a Tensorflow saved_model via Huggingface:

https://huggingface.co/SBB/sbb_binarization

Usage

sbb_binarize \
  -m <path to directory containing model files \
  <input image> \
  <output image>

Images containing a lot of border noise (black pixels) should be cropped beforehand to improve the quality of results.

Example

sbb_binarize -m /path/to/model/ myimage.tif myimage-bin.tif

To use the OCR-D interface:

ocrd-sbb-binarize --overwrite -I INPUT_FILE_GRP -O OCR-D-IMG-BIN -P model "/var/lib/sbb_binarization"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbb_binarization-0.0.11.tar.gz (10.9 kB view hashes)

Uploaded Source

Built Distribution

sbb_binarization-0.0.11-py3-none-any.whl (12.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page