Skip to main content

Chinese Words Segementations

Project description

cwsharp
========

Python中文分词库, 支持自定义词典和多种分词算法.

GitHub: https://github.com/zhengchun/cwsharp-python


特点
========

- 支持多种分词算法:

- MMSegTokenizer - 基于字典的分词算法,默认的分词算法.

- BigramTokenizer - 二元分词,支持英文、数字.

- 自定义字典, 支持中英文混合.

- 兼容Python 2x, 3x.

- MIT协议


安装说明
========

- 自动安装: ``easy_install cwsharp`` 或者 ``pipe install cwsharp``, ``pip3 install cwsharp``

Changelog
========

- [2017-11-10]

- MMSegTokenizer的分词性能提高20X。

- 修正chunk.degree()函数中word.freq为0的异常。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
cwsharp-0.2.tar.gz (400.8 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page