Skip to main content

Ocr_tools is a Python library that generates synthetic images containing Khmer text and other important toolbox

Project description

OCR toolkits

Introduction

Collection of functions to work with ocr and synthetic data generater

Features

  • Generate synthetic images containing Khmer text
  • Customize text content from a file
  • Choose from multiple font styles
  • Option to apply random blur effect to images
  • Generate corresponding labels for each image

Installation

You can install the Khmer Text Image Generator using pip:

pip install ocr_toolkits

Usage

  • create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download sample data here

  • create a folder call font and download all font from this link : font

  • create python script to generate data eg. test.py

from khmerocr_tools import synthetic_data

# Set parameters
image_height = 128
output_folder = 'output'
output_labels_file = 'output/labels.txt'
text_file_path = "dict.txt"
font_option = [1, 2]  

# Generate images and labels
synthetic_data(
    text_file_path, 
    image_height, 
    output_folder, 
    output_labels_file, 
    font_option=font_option, 
    random_blur=True
)

Parameters

  • image_height: Height of the generated images in pixels.
  • output_folder: Path to the folder where generated images will be saved.
  • output_labels_file: Path to the file where labels will be saved.
  • text_file_path: Path to the text file containing Khmer text for generation.
  • font_option: List of integers representing font options.
    • 1 for AKbalthom KhmerLer Regular.
    • 2 for Khmer MEF1 Regular.
    • 3 for Khmer OS Battambang Regular.
    • 4 for Khmer OS Muol Light Regular.
    • 5 for Khmer OS Siemreap Regular.
    • Use an empty list [] to select all available fonts.
  • random_blur: Boolean flag indicating whether to apply random blur effect to images.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocr-toolkits-0.0.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

ocr_toolkits-0.0.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file ocr-toolkits-0.0.1.tar.gz.

File metadata

  • Download URL: ocr-toolkits-0.0.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for ocr-toolkits-0.0.1.tar.gz
Algorithm Hash digest
SHA256 38782c413385b882e08dadcc955856ed074cbe725369a4f62d0cb08ddf37be3e
MD5 966651cc8f647960d717be1185fd56e6
BLAKE2b-256 d6ee0cc99c59c6c538ff16c8671d49a7d553aeef4ca8750b70976d321e9f561e

See more details on using hashes here.

File details

Details for the file ocr_toolkits-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ocr_toolkits-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 10b412057b6694e09f2406e65b64d93d4ac66f9a4840cbd816c5c07c6c2265df
MD5 5f4b9aef56fea4465666af8024cffaf5
BLAKE2b-256 3e638f20a67f9b0077360c8c59b98878164f120d978f5e833d893aa4b041c3ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page