Skip to main content

Wrapper to pytesseract to preserve space and formatting

Project description

OCR_with_format

How to

  • install python -m pip install OCR_with_format
  • see usage OCR_with_format --help (executing with python -m is not supported)

Usage

NAME
    OCR_with_format

SYNOPSIS
    OCR_with_format IMG_PATH <flags>

POSITIONAL ARGUMENTS
    IMG_PATH
        Type: str
        path to the image you want to do OCR on

FLAGS
    -m, --method=METHOD
        Type: str
        Default: 'with_format'
        if 'with_format', will use the author's code if 'none', will output tesseract's default output if 'stackoverflow', will output using another algorithm found on stackoverflow
    --thresholding_method=THRESHOLDING_METHOD
        Type: str
        Default: 'otsu'
        any from "otsu", "otsu_gaussian", "adaptative_gaussian", "all"

        If "all", the three methods will be tried and the final output will be the one which maximizes the mean and median confidences over each parsed words.
    -l, --language=LANGUAGE
        Type: str
        Default: 'eng'
        language to look for in the image
    -o, --output_path=OUTPUT_PATH
        Type: Optional[str]
        Default: None
        if not None, will output to this path and erase its previous content.
    --tesseract_args=TESSERACT_ARGS
        Type: str
        Default: '-...
        default arguments for tesseract
    -q, --quiet=QUIET
        Default: False
        if True, will only print the output and no logs

NOTES
    You can also use flags syntax for POSITIONAL ARGUMENTS

Example

  • Image:
  • output from OCR_with_format ./screenshot.png --thresholding_method="all" --quiet
                                                                    @Unwateh (1) ~        Fork (3)                  (©
    OCR_with_format                          [     Pir ][                   | [  &            <) [     s        -]
                                                                                          About
                                                                                                                              &
                                                                                          Wrapper around pytesseract to
¥ Branches  © Tags                                                                     postprocess in a way that preserves
                                                                                          spacing and formattings.
  i  thiswillbeyourgithub addded license                        10 minutes ago  O 4
                                                                                          &5 GPL-3.0 license
  @  LICENSE            addded license                             10 minutes ago    - Activity
  [u]  __init__.py           minor                                      11 minutes ago     ¢ Ostars
  [u]  requirements.txt       added empty requirements                  11 minutes ago    <& 1 watching
                                                                                          Y  Oforks
  Help people interested in this repository understand your project by
  adding a README.                                                                      Releases
                                                                                          Create No releases a new published release
                                                                                          Packages
                                                                                          No packages published
                                                                                          Publish your first package
                                                                                          Languages
                                                                                          ———
                                                                                           ® Python 100.0%
  • output from OCR_with_format ./screenshot.png --quiet --comparison_run *
OCR_with_format

[ pin | [ @unwateh (@) ~ | [ & Fork (O)

-] [ ¢ s (0

¥ main ~

¥ Branches © Tags

Wrapper around pytesseract to
postprocess in a way that preserves

. . . spacing and formattings.
- thiswillbeyourgithub addded license 10 minutes ago 'O 4
&5 GPL-3.0 license
@ LICENSE addded license 10 minutes ago A Activity
0O _init__py minor 11 minutes ago ¢ Ostars
O requirements.txt added empty requirements 11 minutes ago | @ 1watching
% 0forks
Help people interested in this repository understand your project by
adding a README. Releases

No releases published
Create a new release

Packages

No packages published
Publish your first package

Languages

————
@ Python 100.0%

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr_with_format-0.14.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocr_with_format-0.14-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file ocr_with_format-0.14.tar.gz.

File metadata

  • Download URL: ocr_with_format-0.14.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for ocr_with_format-0.14.tar.gz
Algorithm Hash digest
SHA256 0c3caa52ae3eae598bbe5d958b977e113a888db0e6a1e6e1fd1e579f3a1b945a
MD5 ffd8c8c47210ef82d61a87fccbe8f0d4
BLAKE2b-256 ef5f18687fb1d2983794e0701dadacf0493db8e78f025403f50bfc2c9496e453

See more details on using hashes here.

File details

Details for the file ocr_with_format-0.14-py3-none-any.whl.

File metadata

  • Download URL: ocr_with_format-0.14-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for ocr_with_format-0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 91099aa095bf4ea5a6c07ad21758a8f6fb2e150127f0b7977f4ecf7e48187746
MD5 a32c3636fc00e00fa31fed24891067d7
BLAKE2b-256 9398fcefd89786355b5c97c29fb4faaa0540aa523db9dd598c3f60a9a1301ef5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page