Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.
Project description
pydips
Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.
Note: This package is still in beta, there might be breaking changes in the future. Currently supports macOS (Apple Silicon) and Linux (x86_64 with avx, avx2, and fma instructions)
Install
pip install pydips
Usage
>>> from pydips import BertModel
>>> model = BertModel()
>>> model.cut('阿張先生嗰時好nice㗎', mode='coarse')
['阿張先生', '嗰時', '好', 'nice', '㗎']
>>> model.cut('阿張先生嗰時好nice㗎', mode='fine')
['阿', '張', '先生', '嗰', '時', '好', 'nice', '㗎']
>>> model.cut('阿張先生嗰時好nice㗎', mode='dips_str')
'阿-張|先生 嗰-時 好 nice 㗎'
>>> model.cut('阿張先生嗰時好nice㗎', mode='dips')
['S', 'D', 'P', 'I', 'S', 'D', 'S', 'S', 'I', 'I', 'I', 'S']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pydips-0.0.4.tar.gz
(3.8 MB
view hashes)
Built Distribution
pydips-0.0.4-py3-none-any.whl
(3.8 MB
view hashes)