Skip to main content

A simple, self-contained library for Khmer text processing, with optional OCR and POS tagging support.

Project description

Khmer Easy Tools

A simple, user-friendly, and self-contained Python library for common Khmer Natural Language Processing (NLP) tasks. This package provides easy-to-use functions for keyword extraction and segmentation without requiring complex external dependencies for its core features.

Installation

Install the base package:

pip install khmereasytools

Installing Optional Features

You can install the features you need.

# To install support for POS tagging (khpos)
pip install khmereasytools[khmernltk]

# To install support for OCR (khocr)
pip install khmereasytools[ocr]

# To install all optional features
pip install khmereasytools[all]

For OCR functionality, you must also install Google's Tesseract OCR engine on your system.

How to Use

Keyword Extraction (khfilter)

Uses a built-in segmentation algorithm to find words and remove stop words by using Khmer Stop Word and Segmentaion Dictionary.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្តស្ថិតនៅក្នុងខេត្តសៀមរាប"
keywords = ket.khfilter(text)
print(f"Keywords: '{keywords}'")

Text Segmentation (khseg)

Uses a built-in segmentation algorithm to split text into words.

import khmereasytools as ket
text = "នេះគឺជាប្រាសាទអង្គរវត្ត"
words = ket.khseg(text)
print(f"Segmented Words: {words}")

Syllable Segmentation (khsyllable)

Uses a built-in rule-based method to split text into syllables.

import khmereasytools as ket
text = "សាលារៀន"
syllables = ket.khsyllable(text)
print(f"Syllables: {syllables}")

Part-of-Speech Tagging (khpos)

Requires khmernltk to be installed.

import khmereasytools as ket
# pip install khmereasytools[khmernltk]
text = "ខ្ញុំស្រឡាញ់ភាសាខ្មែរ"
tags = ket.khpos(text)
print(f"POS Tags: {tags}")

OCR from Image (khocr)

Requires ocr dependencies to be installed.

import khmereasytools as ket
# pip install khmereasytools[ocr]
# text_from_image = ket.khocr('khmer_text.png')
# print(f"Text from OCR: {text_from_image}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmereasytools-0.3.8.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

khmereasytools-0.3.8-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file khmereasytools-0.3.8.tar.gz.

File metadata

  • Download URL: khmereasytools-0.3.8.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.8.tar.gz
Algorithm Hash digest
SHA256 5333e0b356be2f45ec54b7f91c1db56ad4b73d68a283452f85f078f77e54479e
MD5 fad7b216f58e3df33a5d7d3f5467fed0
BLAKE2b-256 b103a6d72cfdbb7bb031d8df47f2608ff1b9fb6c673fec0e255b3ae0a380baf5

See more details on using hashes here.

File details

Details for the file khmereasytools-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: khmereasytools-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for khmereasytools-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e56c572520e2c56ce026a9ce10dbb925103363f9b67fa4274d9b17368203ec3e
MD5 d48661c8d95e7663e7342ba7fca7eac5
BLAKE2b-256 8fe3173df0956ff27a9bc63af5248f06e36b0012a339f2c59d8d93b88df93f7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page