A Khmer word segmentation tool built for NIPTICT Khmer Word Segmentation CRF model.
Project description
Khmer Segment
A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
[!IMPORTANT]
km-5tag-seg-model
is required for this script to work. This library doesn't provide the model file.
Usage
pip install khmersegment
from khmersegment import Segmenter
segmenter = Segmenter("-m km-5tag-seg-model")
print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=False))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នកណា', 'ទេ', '?']
print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=True))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នក', 'ណា', 'ទេ', '?']
License
Apache-2.0
Related
- pycrfpp Python binding for CRF++
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
khmersegment-0.1.2.tar.gz
(7.0 kB
view details)
File details
Details for the file khmersegment-0.1.2.tar.gz
.
File metadata
- Download URL: khmersegment-0.1.2.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66779cc5b220fe1099dc8a763431cf1f896f3609bf47a23a9370b39a1934291d |
|
MD5 | 73a057a4783099d4d35587b898b048ad |
|
BLAKE2b-256 | 1dc3974d7d091c78db5da3c1d4c63dbbedb977099ff023c971d9683c1db81535 |