Skip to main content

Wrapper to pytesseract to preserve space and formatting

Project description

OCR_with_format

How to

  • install python -m pip install OCR_with_format
  • see usage OCR_with_format --help (executing with python -m is not supported)

Usage

NAME
    OCR_with_format

SYNOPSIS
    OCR_with_format IMG_PATH <flags>

POSITIONAL ARGUMENTS
    IMG_PATH
        Type: str
        path to the image you want to do OCR on

FLAGS
    -m, --method=METHOD
        Type: str
        Default: 'with_format'
        if 'with_format', will use the author's code if 'none', will output tesseract's default output if 'stackoverflow', will output using another algorithm found on stackoverflow
    --thresholding_method=THRESHOLDING_METHOD
        Type: str
        Default: 'otsu'
        any from "otsu", "otsu_gaussian", "adaptative_gaussian", "all"

        If "all", the three methods will be tried and the final output will be the one which maximizes the mean and median confidences over each parsed words.
    -l, --language=LANGUAGE
        Type: str
        Default: 'eng'
        language to look for in the image
    -o, --output_path=OUTPUT_PATH
        Type: Optional[str]
        Default: None
        if not None, will output to this path and erase its previous content.
    --tesseract_args=TESSERACT_ARGS
        Type: str
        Default: '-...
        default arguments for tesseract
    -q, --quiet=QUIET
        Default: False
        if True, will only print the output and no logs

NOTES
    You can also use flags syntax for POSITIONAL ARGUMENTS

Example

  • Image:
  • output from OCR_with_format ./screenshot.png --thresholding_method="all" --quiet
                                                                    @Unwateh (1) ~        Fork (3)                  (©
    OCR_with_format                          [     Pir ][                   | [  &            <) [     s        -]
                                                                                          About
                                                                                                                              &
                                                                                          Wrapper around pytesseract to
¥ Branches  © Tags                                                                     postprocess in a way that preserves
                                                                                          spacing and formattings.
  i  thiswillbeyourgithub addded license                        10 minutes ago  O 4
                                                                                          &5 GPL-3.0 license
  @  LICENSE            addded license                             10 minutes ago    - Activity
  [u]  __init__.py           minor                                      11 minutes ago     ¢ Ostars
  [u]  requirements.txt       added empty requirements                  11 minutes ago    <& 1 watching
                                                                                          Y  Oforks
  Help people interested in this repository understand your project by
  adding a README.                                                                      Releases
                                                                                          Create No releases a new published release
                                                                                          Packages
                                                                                          No packages published
                                                                                          Publish your first package
                                                                                          Languages
                                                                                          ———
                                                                                           ® Python 100.0%
  • output from OCR_with_format ./screenshot.png --quiet --comparison_run *
OCR_with_format

[ pin | [ @unwateh (@) ~ | [ & Fork (O)

-] [ ¢ s (0

¥ main ~

¥ Branches © Tags

Wrapper around pytesseract to
postprocess in a way that preserves

. . . spacing and formattings.
- thiswillbeyourgithub addded license 10 minutes ago 'O 4
&5 GPL-3.0 license
@ LICENSE addded license 10 minutes ago A Activity
0O _init__py minor 11 minutes ago ¢ Ostars
O requirements.txt added empty requirements 11 minutes ago | @ 1watching
% 0forks
Help people interested in this repository understand your project by
adding a README. Releases

No releases published
Create a new release

Packages

No packages published
Publish your first package

Languages

————
@ Python 100.0%

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_with_format-0.12.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

OCR_with_format-0.12-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file ocr_with_format-0.12.tar.gz.

File metadata

  • Download URL: ocr_with_format-0.12.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.19

File hashes

Hashes for ocr_with_format-0.12.tar.gz
Algorithm Hash digest
SHA256 8fea6a1ae15aa723d2a8630d5ae6c90f40e32f1321cb2510fb53c9f441dc8921
MD5 662eb9fd71a9696632aeb558ad71cf1d
BLAKE2b-256 608ce716bfad6569a4d2dabb2d790017551de5ba2a9034e850f4c9f640a9fcbc

See more details on using hashes here.

File details

Details for the file OCR_with_format-0.12-py3-none-any.whl.

File metadata

File hashes

Hashes for OCR_with_format-0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 b968663df8c4bae8b25c49c773ed8b559d1be29f000ac6af7c5a99700d233df9
MD5 e785f92df98def4cade76f8bd37bb8fb
BLAKE2b-256 28779f14a4a005993b44e0bab6a67d1658162210ee94fdd934e56245054c31c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page