No project description provided
Project description
Seema
Khmer, Lao, Myanmar, and Thai word segmentation/breaking library in Python written in Rust
Example
>>> from seema import Seema
>>> Seema("words_th.txt", "thai_cluster_rules.txt")
>>> s.segment_into_strings("ไก่จิกเด็ก")
['ไก่', 'จิก', 'เด็ก']
words_th.txt and thai_cluster_rules.txt can be downloaded from https://codeberg.org/mekong-lang/chamkho/src/branch/main/data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
seema-0.2.1.tar.gz
(5.1 kB
view hashes)
Built Distribution
Close
Hashes for seema-0.2.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5a855ec52540dd2cd21d2b09032a86d65ced3c5ead72fda627a1b0ea4c0116f |
|
MD5 | d52fbab33de2cf9d5a51e2f5549050c2 |
|
BLAKE2b-256 | ef54836c42520efe74142877b9f21468c66acc5efb6c539f89d4f8bc2516d580 |