Skip to main content

Detect unwanted characters in images inside the Open Telekom Cloud helpcenter

Project description

HCDC - Helpcenter Character Detection Client

HCDC is a client to detect certain characters inside newly changed files from a PR or a different Git branch. It will analyze text files and use an OCR service to recognize characters inside new or changed images.

Installation

pip install .

Usage

hcdc --help

Options

  -h, --help            Show this help message and exit.
  --debug               Option enables debug output.
  --processes <processes>
                        Number of processes for minification. Default: 4
  --repo-path <repo-path>
                        Path to the Git repository. Default: .
  --image-file-extensions <file-extensions> [<file-extensions> ...]
                        Image file extensions to be checked. Default: .jpg .png .jpeg .gif .webp .avif
  --text-file-extensions <file-extensions> [<file-extensions> ...]
                        Text file extensions to be checked. Default: .txt .md .rst .ini .cfg .json .xml .yml .yaml .py
  --branch <branch>     Branch to compare against the main branch. Default: main
  --main-branch <main-branch>
                        Name of the main branch. Default: main
  --ocr-url <ocr-url>   URL for the OCR Service. Default: https://ocr.eu-de.otc.t-systems.com/v2/project-id/ocr/general-text
  --regex-pattern <regex-pattern> [<regex-pattern> ...]
                        Regex pattern to check for unwanted characters. Default: (?![\u4e09\u767d\u76ee\u4e09\u8279\u53e3\u533a\u4e2a\u516b\u4e00\u4eba])[\u4e01-\u9fff]+
  --confidence <confidence>
                        Confidence threshold for image recognition. Default: 0.97

Custom Regex Pattern

You can use a custom regex pattern to check for unwanted characters in the files. The default pattern excludes some Chinese characters that may cause false positives in OCR results while checking for all other Chinese characters.

Authentication

To use this tool, ensure that you specify an AUTH_TOKEN to access the OCR service. For details on obtaining a token, refer to the official T-Systems Documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcdc-0.2.2.tar.gz (16.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hcdc-0.2.2-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file hcdc-0.2.2.tar.gz.

File metadata

  • Download URL: hcdc-0.2.2.tar.gz
  • Upload date:
  • Size: 16.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for hcdc-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9c6f1570217c31b2f8af74236179554bdd4eb74e0902578525466269baa2a2ae
MD5 ffab4d0d34113c42f8eceb3153f05521
BLAKE2b-256 b575e9d59c8eaab52a5fbb6aaf9fd82bcc297293aeeaf6d7a0a249b6a2762f33

See more details on using hashes here.

File details

Details for the file hcdc-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: hcdc-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for hcdc-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1f38957c7c8062df95686fdb29f9128e4014f91b44c00609cbafdf5455f889
MD5 83bc5188398a38f0025a9be3d616acb6
BLAKE2b-256 107fe34d7907ef75631b302310b730c698a563ab8b1b6d74d84a73b9e2086c45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page