Python Run Length Smoothing Algorithm for Document Processing
Project description
Project Description
- RUN LENGTH SMOOTHING ALGORITHM(RLSA) is a method mainly used for block segmentation and text discrimination.
- It mainly uses in Document Image Processing to extract out the ROI(region of interest) like block-of-text/title/content with applied heuristics.
Install requirements
- pip install -r requirements.txt
Install
- pip install pythonRLSA
Input & Output
Output of 3 cases with value "10" can be seen here
More results can be seen here
How it works
- '255'(white pixel) wil be converted to '0'(black pixel) in a image, if the number of adjacent 255's are less than the predefined limit "value".
- The "value" varies among the different images.
Sample Test Case
- value = 3
- input - [0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 0, 255]
- output - [0, 0, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255]
To test
- python pythonRLSA/test_rlsa_unittest.py -v
Unittest Results
- test_bool (__main__.TestRLSA) ... ok
- test_image (__main__.TestRLSA) ... Image must be an numpy ndarray and must be in binary ... ok
- test_rlsa_hori (__main__.TestRLSA) ... ok
- test_rlsa_hori_vert (__main__.TestRLSA) ... ok
- test_rlsa_vert (__main__.TestRLSA) ... ok
- test_value (__main__.TestRLSA) ... ok
Ran 6 tests in 0.003s
OK
Prerequisites
- python3.5+
- Image must be a binary ndarray(255's/1's/0's)
- Must pass a predefined limit, a certain integer "value"
Method
- rlsa
Parameters
- image - numpy.ndarray(required)
- horizantal - boolean(required)
- vertial - boolean(required)
- value - any positive integer(int)(required)
IPython code to convert Image to Binary and RLSA usage
- # convert the image to binary
- import cv2
- image = cv2.imread('test_images/image.jpg')
- gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
- (thresh, image_binary) = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
- # function call
- from pythonRLSA import rlsa
- image_rlsa_horizontal = rlsa.rlsa(image_binary, True, False, 10)
Bugs/Errors
Please ensure that you have updated pip to the latest version before installing pythonRLSA.
If you find any bugs/errors in the usage of above code, please raise an issue through Github or send an email to vasista.1245@gmail.com with a clear example that can reproduce the issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pythonRLSA-0.0.1.tar.gz
(3.8 MB
view hashes)
Built Distribution
Close
Hashes for pythonRLSA-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a932b2e514d1b8691da0ca6617fbaa15edca9b4cd128cef9a643fdcfc5d18df2 |
|
MD5 | 7d0bd09c0317879e60308cc5f9a81492 |
|
BLAKE2b-256 | 9d46b8b2d27c70e12b83081851e4ccf79fdbb7c2d12b9d7c219441073a4aab22 |