Skip to main content

Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.

Project description

pydips

Multi-criteria Cantonese segmentation with dashes, intermediates, pipes, and spaces.

Note: This package is still in beta, there might be breaking changes in the future. Currently supports macOS (Apple Silicon) and Linux (x86_64 with avx, avx2, and fma instructions)

Install

pip install pydips

Usage

>>> from pydips import BertModel
>>> model = BertModel()

>>> model.cut('阿張先生嗰時好nice㗎', mode='coarse')
['阿張先生', '嗰時', '好', 'nice', '㗎']

>>> model.cut('阿張先生嗰時好nice㗎', mode='fine')
['阿', '張', '先生', '嗰', '時', '好', 'nice', '㗎']

>>> model.cut('阿張先生嗰時好nice㗎', mode='dips_str')
'阿-張|先生 嗰-時 好 nice 㗎'

>>> model.cut('阿張先生嗰時好nice㗎', mode='dips')
['S', 'D', 'P', 'I', 'S', 'D', 'S', 'S', 'I', 'I', 'I', 'S']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydips-0.0.4.tar.gz (3.8 MB view hashes)

Uploaded Source

Built Distribution

pydips-0.0.4-py3-none-any.whl (3.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page