LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
Project description
LEKCut
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx model.
Install
pip install lekcut
How to use
from lekcut import word_tokenize
word_tokenize("ทดสอบการตัดคำ")
# output: ['ทดสอบ', 'การ', 'ตัด', 'คำ']
API
word_tokenize(text: str, model: str="deepcut", path: str="default") -> List[str]
Model
deepcut
- We ported deepcut model from tensorflow.keras to ONNX model. The model and code come from Deepcut's Github. The model is here.
Load custom model
If you has trained custom your model from deepcut or other that LEKCut support, You can load the custom model by path
in word_tokenize
after porting your model.
- How to train custom model ith your dataset by deepcut - Notebook (Needs to update
deepcut/train.py
before train model)
How to porting model?
See notebooks/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
LEKCut-0.1.tar.gz
(2.0 MB
view details)
Built Distribution
LEKCut-0.1-py3-none-any.whl
(2.0 MB
view details)
File details
Details for the file LEKCut-0.1.tar.gz
.
File metadata
- Download URL: LEKCut-0.1.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fbb200e129c204252369001ffcadea0e37c9871a6aa71a7819e49f1cbb973ba |
|
MD5 | 835e552eb25772900d7fa26420a94819 |
|
BLAKE2b-256 | fd895853548acf1f39ac4554caa27e19a547becb68b7a0aa39fc36cc91b3a5af |
File details
Details for the file LEKCut-0.1-py3-none-any.whl
.
File metadata
- Download URL: LEKCut-0.1-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7e503b2816486b5998e4f0024f04ba197af131f3afd604c294be6b30060579b |
|
MD5 | d2c1c55b9fed0fe2972cdcd5391232c5 |
|
BLAKE2b-256 | 1c94d39131588e28e5e5c85554423d41dd7715438eb3dbd42bd92a91c52a0d7a |