Skip to main content

A simple library for Khmer text processing, including keyword extraction, segmentation, OCR, and validation.

Project description

Khmer Easy Tools

A simple, user-friendly Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction, segmentation, POS tagging, OCR, and character validation.

Installation

Install the package using pip:

pip install khmereasytools

For OCR functionality, you must also install Google's Tesseract OCR engine on your system.

How to Use

Khmer Character Validation (is_khmer)

Checks if a string contains Khmer characters.

import khmereasytools as ket

print(ket.is_khmer("សួស្តី"))  # True
print(ket.is_khmer("Hello")) # False

Keyword Extraction (khfilter)

Segments text and removes stop words.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{{keywords}}'")

Text Segmentation (khseg)

Segments text into words using khmer-nltk.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")

Syllable Segmentation (syllable_segment)

Segments text into syllables.

import khmereasytools as ket
text = "សាលារៀន"
syllables = ket.syllable_segment(text)
print(f"Syllables: {syllables}")

Part-of-Speech Tagging (pos_tag)

Tags words with their part of speech.

import khmereasytools as ket
text = "ខ្ញុំ ស្រឡាញ់ ភាសាខ្មែរ"
tags = ket.pos_tag(text)
print(f"POS Tags: {tags}")

OCR from Image (ocr_from_image)

Extracts Khmer text from an image.

import khmereasytools as ket
# Make sure you have an image file e.g., 'khmer_text.png'
# text_from_image = ket.ocr_from_image('khmer_text.png')
# print(f"Text from OCR: {{text_from_image}}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmereasytools-0.3.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

khmereasytools-0.3.2-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file khmereasytools-0.3.2.tar.gz.

File metadata

  • Download URL: khmereasytools-0.3.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.2.tar.gz
Algorithm Hash digest
SHA256 a865e2461cd0ea8fde0bedaa8ba1f303b3d05c7414003e85a278537d939e6668
MD5 08d35818364a06264fee97b6b7ea6e54
BLAKE2b-256 9e3334a320ee0ba3ac3f7286f14fe5c1cfa27cd1e5566f2dc2b69fa455483ddb

See more details on using hashes here.

File details

Details for the file khmereasytools-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: khmereasytools-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 be09d2ba6d9a44037edff888fa78a95df985dc67886ffe32868c24c42bd53823
MD5 d774b681a63b5779fcfef04a2ec0bafd
BLAKE2b-256 1b69a01a4c9a5ef5afda0c0d0c6c8d7f4c58e0f2f270e56a156227fa975da183

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page