boxdetect

boxdetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like characters boxes on scanned forms.

Project description

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

Main purpose of this library is to provide helpful functions for processing document images like bank forms, applications, etc. and extract regions where character boxes or tick/check boxes are present.

Getting Started

Checkout the examples below and get-started.ipynb notebook which holds end to end examples for using BoxDetect.

Installation

BoxDetect can be installed directly from this repo using pip:

pip install git+https://github.com/karolzak/boxdetect

or through PyPI

pip install boxdetect

Usage examples

You can use BoxDetect either by leveraging one of the pre-made pipelines or by treating it as a toolbox to compose your own pipelines that fits your needs perfectly.

Using existing pipelines:

Start with getting the default config and modifying it for your requirements and data:

from boxdetect import config

file_name = 'bank_form1.png'
# important to adjust these values to match the size of boxes on your image
config.min_w, config.max_w = (35,48)
config.min_h, config.max_h = (30,37)
# the more scaling factors the more accurate the results but also it takes more time to processing
# too small scaling factor may cause false positives
# too big scaling factor will take a lot of processing time
config.scaling_factors = [0.4, 0.5, 0.7]
# w/h ratio range for boxes/rectangles filtering
config.wh_ratio_range = (0.5, 1.5)
# num of iterations when running dilation tranformation (to engance the image)
config.dilation_iterations = 1

As a second step simply run:

from boxdetect.pipelines import get_boxes

rects, grouping_rects, image, output_image = get_boxes(
    file_name, config=config, plot=False)

Each of the returned elements are rectangular bounding boxes representing grouped character boxes (x, y, w, h)

print(grouping_rects)

OUT:
# (x, y, w, h)
[(276, 276, 1221, 33),
 (324, 466, 430, 33),
 (384, 884, 442, 33),
 (985, 952, 410, 32),
 (779, 1052, 156, 33),
 (253, 1256, 445, 33)]

plt.figure(figsize=(20,20))
plt.imshow(output_image)
plt.show()

Project details

Release history Release notifications | RSS feed

1.0.2

Jan 18, 2023

1.0.1

Dec 27, 2022

1.0.0

Jul 16, 2020

0.1.1

Jun 5, 2020

This version

0.1.0

Jun 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boxdetect-0.1.0.tar.gz (5.8 kB view hashes)

Uploaded Jun 4, 2020 Source

Built Distribution

boxdetect-0.1.0-py3-none-any.whl (7.3 kB view hashes)

Uploaded Jun 4, 2020 Python 3

Hashes for boxdetect-0.1.0.tar.gz

Hashes for boxdetect-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`56eedaf6e6faca861f4d2accec877f60aa3be4d9033b1fd3a1a7a0d8672df899`
MD5	`e9ff528f7216c06b1a5fbcc2ef99dae3`
BLAKE2b-256	`6409ee937df78b0bb7bb5c113db6f4b4ffa2760a278ff9596097340cf3b88597`

Hashes for boxdetect-0.1.0-py3-none-any.whl

Hashes for boxdetect-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa1b592e953b9d09adb26987a53a4c049c01c0ff4ea7232b5b9e16107588d76e`
MD5	`6208797c9f3813b87bacc48b49db2e61`
BLAKE2b-256	`719043bb5f914350fab6d705ad9881fa39134e897b1b636cdd384325ab7eab10`