Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.
Project description
pydips
Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.
Note: This package is still in beta, there might be breaking changes in the future. Currently supports macOS (Apple Silicon) and Linux (x86_64 with avx, avx2, and fma instructions)
Install
pip install pydips
Usage
>>> from pydips import BertModel
>>> model = BertModel()
>>> model.cut('阿張先生嗰時好nice㗎', mode='coarse')
['阿張先生', '嗰時', '好', 'nice', '㗎']
>>> model.cut('阿張先生嗰時好nice㗎', mode='fine')
['阿', '張', '先生', '嗰', '時', '好', 'nice', '㗎']
>>> model.cut('阿張先生嗰時好nice㗎', mode='dips_str')
'阿-張|先生 嗰-時 好 nice 㗎'
>>> model.cut('阿張先生嗰時好nice㗎', mode='dips')
['S', 'D', 'P', 'I', 'S', 'D', 'S', 'S', 'I', 'I', 'I', 'S']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pydips-0.0.4.tar.gz
(3.8 MB
view details)
Built Distribution
File details
Details for the file pydips-0.0.4.tar.gz
.
File metadata
- Download URL: pydips-0.0.4.tar.gz
- Upload date:
- Size: 3.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c05bfadfac41a620fa28c3015cad5b4b9d54d601936b537a6d02f8eff5e2f2df |
|
MD5 | fa601b5046c58288c8524a1f90ecd34d |
|
BLAKE2b-256 | 00e094bbfc9797b01b5d749d34e025638ea60d011f220e06ab868094462f74a4 |
File details
Details for the file pydips-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: pydips-0.0.4-py3-none-any.whl
- Upload date:
- Size: 3.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ff7f2b48fa253c9112b72482573f0185c7c8f2ead14e776e85fe10e76d64f3a |
|
MD5 | ab12395f3e25c8530f79473a4a9a0c89 |
|
BLAKE2b-256 | 8e2b7397dc6e8b9707afa6287ab7dd11cb15d8d05d8e52fb78a4832dcd4b8368 |