A (fast) Khmer word segmentation toolkit.
Project description
khmercut
A (fast) Khmer word segmentation toolkit.
- A single python file
- Using
pycrfsuiteonly
pip install khmercut
Python
from khmercut import tokenize
tokenize("ឃាត់ខ្លួនជនសង្ស័យ០៤នាក់ ករណីលួចខ្សែភ្លើង នៅស្រុកព្រៃនប់")
# => ['ឃាត់ខ្លួន', 'ជនសង្ស័យ', '០៤', 'នាក់', ' ', 'ករណី', 'លួច', 'ខ្សែភ្លើង', ' ', 'នៅ', 'ស្រុក', 'ព្រៃនប់']
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
khmercut-0.1.0.tar.gz
(5.9 MB
view details)
File details
Details for the file khmercut-0.1.0.tar.gz.
File metadata
- Download URL: khmercut-0.1.0.tar.gz
- Upload date:
- Size: 5.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee22b088cf9af45c362a57175c607e09ea2dd1e17bf9d2b621e61f69bec851b4
|
|
| MD5 |
631fa5607a928238d029ef44057c439b
|
|
| BLAKE2b-256 |
c04d7766f724bd99f6cfbe246d8929c079dfe8d47ca0196fb0f991bdb4684be8
|