Skip to main content

Document whitening (foreground separation)

Project description

Document whitening (foreground separation)

This package tries to separate text/line foreground and background by 2D median filter.

original foreground background

Installation

Install from PyPI. Works on Python 3.

pip install whitening

Example usage

Python API

It works with images represented as PIL.Image or as a numpy array. Images can be either RGB or grayscale.

import numpy as np
import PIL.Image

from whitening import whiten

# possible to use numpy array as input/output
image = np.asarray(PIL.Image.open('image.jpg'), dtype='uint8')
foreground, background = whiten(image, kernel_size=20, downsample=4)
PIL.Image.fromarray(foreground).save('foreground.jpg', 'jpeg')

# or directly a PIL image
image = PIL.Image.open('image.jpg')
foreground, background = whiten(image, kernel_size=20, downsample=4)
foreground.save('foreground.jpg', 'jpeg')

CLI

It install an entry point called whiten.

# help
$ whiten -h

# whiten an image and save the foreground output
$ whiten input.jpg foreground.jpg

# specify the kernel size
$ whiten input.jpg foreground.jpg -k 100

# work in grayscale instead of RGB (3x faster)
$ whiten input.jpg foreground.jpg -g

# downsample the image 4x (faster, but a bit less precise)
$ whiten input.jpg foreground.jpg -d 4

# save also the background
$ whiten input.jpg foreground.jpg -b background.jpg

We assume the original images is a product of foreground and background, thus we can recover the foreground by dividing the image by the background: I = F * B => F = I / B. We try to approximate the background by 2D median filtering the original image which suppresses sparse features such as text and lines.

Select kernel size that's enough for not making artifacts while small enough to keep computation fast. A good starting point is 50 pixels.

A 9.5 Mpx image can be processed on a MacBook in 15 s, with grayscale and downsampling 4x the run time can be reduced to 1 s! Quite good results can be obtained even with kernel size 10 and downsampling 16x.

More info: http://bohumirzamecnik.cz/blog/2015/image-whitening/

Development

See the Makefile for various development tasks.

License

Author: Bohumír Zámečník bohumir.zamecnik@gmail.com

Supported by Rossum, creating a world without manual data entry.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whitening-0.2.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file whitening-0.2.tar.gz.

File metadata

  • Download URL: whitening-0.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for whitening-0.2.tar.gz
Algorithm Hash digest
SHA256 0f19f4e41b37c08ecef1ad7f262322c632e532475113aa7e03e5c544cc55a5ef
MD5 66f6973deee47877a5e77abac13af305
BLAKE2b-256 5d61eb6442e547f6e429fe325955e6bd4e33b27095156eb4db4579e56fe0561c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page