Skip to main content

A simple, self-contained library for Khmer text processing, with optional OCR support.

Project description

Khmer Easy Tools

A simple, user-friendly, and self-contained Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction and segmentation without requiring complex external dependencies for its core features.

Installation

Install the base package:

pip install khmereasytools

Installing Optional OCR Feature

To use the OCR functionality, you must install the ocr extras.

pip install khmereasytools[ocr]

For OCR functionality, you must also install Google's Tesseract OCR engine on your system.

How to Use

Khmer Character Validation (is_khmer)

import khmereasytools as ket
print(ket.is_khmer("សួស្តី"))  # True

Keyword Extraction (khfilter)

Uses a built-in segmentation algorithm to find words and remove stop words.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{{keywords}}'")

Text Segmentation (khseg)

Uses a built-in segmentation algorithm to split text into words.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")

Syllable Segmentation (khsyllable)

Uses a built-in rule-based method to split text into syllables.

import khmereasytools as ket
text = "សាលារៀន"
syllables = ket.khsyllable(text)
print(f"Syllables: {syllables}")

OCR from Image (khocr)

Requires ocr dependencies to be installed.

import khmereasytools as ket
# pip install khmereasytools[ocr]
# Make sure you have an image file e.g., 'khmer_text.png'
# text_from_image = ket.khocr('khmer_text.png')
# print(f"Text from OCR: {{text_from_image}}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmereasytools-0.3.6.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

khmereasytools-0.3.6-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file khmereasytools-0.3.6.tar.gz.

File metadata

  • Download URL: khmereasytools-0.3.6.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.6.tar.gz
Algorithm Hash digest
SHA256 020b828a0321529494a10b7b73ee12bfc04fc11feeab6c661a26a35271394fc5
MD5 1a5732fde7d15db1aa141f72128cfe41
BLAKE2b-256 9e50e5cec030ccb14e66ee974716b31f58e6b04af2033c56d6831703237a2b85

See more details on using hashes here.

File details

Details for the file khmereasytools-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: khmereasytools-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c227a9907ba823621ec41d86548cccb32658025d202a640d60b3a69dee82c6ab
MD5 f41480f7dbd170fe8bb1970a6b0035b9
BLAKE2b-256 5735ff999085bd7e2d786414378b3036ad84e42818f379c68ee16e6b3b70f9ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page