Skip to main content

Khmerocr_tools is a Python library that generates synthetic images containing Khmer text

Project description

Khmerocr_tools | Synthetic Data Generator

Introduction

The Khmer Text Image Generator is a Python library that generates synthetic images containing Khmer text for use in training optical character recognition (OCR) models. It allows users to customize various aspects of the generated images, such as the text content, font style, background color, and blur effect.

Features

  • Generate synthetic images containing Khmer text
  • Customize text content from a file
  • Choose from multiple font styles
  • Option to apply random blur effect to images
  • Generate corresponding labels for each image

Installation

You can install the Khmer Text Image Generator using pip:

pip install khmerocr_tools

Usage

  • create text file to words list eg. dict.txt and put all khmer words you want to gnerate or download sample data here

  • create a folder call font and download all font from this link : font

  • create python script to generate data eg. test.py

from khmerocr_tools import synthetic_data

# Set parameters
image_height = 128
output_folder = 'output'
output_labels_file = 'output/labels.txt'
text_file_path = "dict.txt"
font_option = [1, 2]  

# Generate images and labels
synthetic_data(
    text_file_path, 
    image_height, 
    output_folder, 
    output_labels_file, 
    font_option=font_option, 
    random_blur=True
)

Parameters

  • image_height: Height of the generated images in pixels.
  • output_folder: Path to the folder where generated images will be saved.
  • output_labels_file: Path to the file where labels will be saved.
  • text_file_path: Path to the text file containing Khmer text for generation.
  • font_option: List of integers representing font options.
    • 1 for AKbalthom KhmerLer Regular.
    • 2 for Khmer MEF1 Regular.
    • 3 for Khmer OS Battambang Regular.
    • 4 for Khmer OS Muol Light Regular.
    • 5 for Khmer OS Siemreap Regular.
    • Use an empty list [] to select all available fonts.
  • random_blur: Boolean flag indicating whether to apply random blur effect to images.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Release history Release notifications | RSS feed

This version

0.13

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

khmerocr_tools-0.13-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file khmerocr_tools-0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for khmerocr_tools-0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 b035ffdc84cdcd00a76511c99a2b3f1ef61a2d8fd923de59a9d4aeb6bff6e7b0
MD5 3d0fce113aa0cb4a935c9ed7e17bca0a
BLAKE2b-256 16a9b9bdcc03092c646a82a8baad8dfd7d93e6dafb531621fdee7422f1d5c2d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page