Basic nlp for thai
Project description
Word token to Pseudo Morpheme Segmentation
-ไม่ควรใช้งานกับประโยคภาษาไทยยาวๆ ควรตัดคำ หรือ ใช้งานรวมกับ TokenIdentification
Example code
from basicthainlp import PmSeg
ps = PmSeg()
textTest = 'รัฐราชการ'
data_list = ps.word2DataList(textTest)
print(data_list)
pred = ps.dataList2pmSeg(data_list)
print(list(textTest))
print(pred[0])
print(ps.pmSeg2List(list(textTest),pred[0]))
[['ร', 'Ccc'], ['ั', 'Vu'], ['ฐ', 'C'], ['ร', 'Ccc'], ['า', 'Vm'], ['ช', 'C'], ['ก', 'C'], ['า', 'Vm'], ['ร', 'Ccc']]
['ร', 'ั', 'ฐ', 'ร', 'า', 'ช', 'ก', 'า', 'ร']
['B', 'I', 'C', 'B', 'I', 'C', 'B', 'I', 'I']
['รัฐ', 'ราช', 'การ']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
basicthainlp-0.1.17.tar.gz
(1.5 MB
view details)
Built Distribution
File details
Details for the file basicthainlp-0.1.17.tar.gz
.
File metadata
- Download URL: basicthainlp-0.1.17.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c522cd401dcd69e72bf4b7c5cb0f7280d8ae26414b458ee8b49e12c69081e057 |
|
MD5 | c3d1223324c9f3c5a1cafc33baed4bde |
|
BLAKE2b-256 | fe8334fece58daee2e513484eb8aaa47758ad9d97b241af12f7160f90a0f927a |
File details
Details for the file basicthainlp-0.1.17-py3-none-any.whl
.
File metadata
- Download URL: basicthainlp-0.1.17-py3-none-any.whl
- Upload date:
- Size: 2.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2552386c554262cfda5fd1a686d7dfb7ef937a2e2d8ac894e433b014416d8000 |
|
MD5 | 2141d27efd18c6f15878756db8375060 |
|
BLAKE2b-256 | cd6b6352d58fbafa3f2484da153ca6790bf74c79588cbac44945461264a18321 |