Skip to main content

A simple library for Khmer text processing, including keyword extraction, segmentation, OCR, and validation.

Project description

Khmer Easy Tools

A simple, user-friendly Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction, segmentation, POS tagging, OCR, and character validation.

Installation

Install the package using pip:

pip install khmereasytools

For OCR functionality, you must also install Google's Tesseract OCR engine on your system.

How to Use

Khmer Character Validation (is_khmer)

Checks if a string contains Khmer characters.

import khmereasytools as ket

print(ket.is_khmer("សួស្តី"))  # True
print(ket.is_khmer("Hello")) # False

Keyword Extraction (khfilter)

Segments text and removes stop words.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{{keywords}}'")

Text Segmentation (khseg)

Segments text into words using khmer-nltk.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")

Syllable Segmentation (syllable_segment)

Segments text into syllables.

import khmereasytools as ket
text = "សាលារៀន"
syllables = ket.syllable_segment(text)
print(f"Syllables: {syllables}")

Part-of-Speech Tagging (pos_tag)

Tags words with their part of speech.

import khmereasytools as ket
text = "ខ្ញុំ ស្រឡាញ់ ភាសាខ្មែរ"
tags = ket.pos_tag(text)
print(f"POS Tags: {tags}")

OCR from Image (ocr_from_image)

Extracts Khmer text from an image.

import khmereasytools as ket
# Make sure you have an image file e.g., 'khmer_text.png'
# text_from_image = ket.ocr_from_image('khmer_text.png')
# print(f"Text from OCR: {{text_from_image}}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmereasytools-0.3.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

khmereasytools-0.3.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file khmereasytools-0.3.1.tar.gz.

File metadata

  • Download URL: khmereasytools-0.3.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.1.tar.gz
Algorithm Hash digest
SHA256 4f89d11d0b0551f0bc5d3c03ac8496de2a7216bc3db0e17e3778b3d5c352db98
MD5 958c90b9c8ffd7810761c208f92f4fdd
BLAKE2b-256 6889629afbb5a0c2a222ca7a5fdf77a3bb7a0db38e6df40b0d15987f14b864dd

See more details on using hashes here.

File details

Details for the file khmereasytools-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: khmereasytools-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 48fcf441be3352e3c68975db854862a1a8c6b63d21492e8ae8ad977b718849cc
MD5 fb6cded76559c9739e714e6477a42efd
BLAKE2b-256 ca1f79adb29654134f686ae9c1dddbd7d3cba2e9f0297656f21b07b42da132e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page