Skip to main content

A (fast) Khmer word segmentation toolkit.

Project description

khmercut

A (fast) Khmer word segmentation toolkit.

  • A single python file
  • Using pycrfsuite only
pip install khmercut

Python

from khmercut import tokenize

tokenize("ឃាត់ខ្លួនជនសង្ស័យ០៤នាក់ ករណីលួចខ្សែភ្លើង នៅស្រុកព្រៃនប់")
# => ['ឃាត់ខ្លួន', 'ជនសង្ស័យ', '០៤', 'នាក់', ' ', 'ករណី', 'លួច', 'ខ្សែភ្លើង', ' ', 'នៅ', 'ស្រុក', 'ព្រៃនប់']

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khmercut-0.1.0.tar.gz (5.9 MB view details)

Uploaded Source

File details

Details for the file khmercut-0.1.0.tar.gz.

File metadata

  • Download URL: khmercut-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.19

File hashes

Hashes for khmercut-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee22b088cf9af45c362a57175c607e09ea2dd1e17bf9d2b621e61f69bec851b4
MD5 631fa5607a928238d029ef44057c439b
BLAKE2b-256 c04d7766f724bd99f6cfbe246d8929c079dfe8d47ca0196fb0f991bdb4684be8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page