Skip to main content

Chinese Words Segementations

Project description

cwsharp
========

Python中文分词库, 支持自定义词典和多种分词算法.

GitHub: https://github.com/zhengchun/cwsharp-python


特点
========

- 支持多种分词算法:

- MMSegTokenizer - 基于字典的分词算法,默认的分词算法.

- BigramTokenizer - 二元分词,支持英文、数字.

- 自定义字典, 支持中英文混合.

- 兼容Python 2x, 3x.

- MIT协议


安装说明
========

- 自动安装: ``easy_install cwsharp`` 或者 ``pipe install cwsharp``, ``pip3 install cwsharp``

Changelog
========

- [2017-11-10]

- MMSegTokenizer的分词性能提高20X。

- 修正chunk.degree()函数中word.freq为0的异常。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwsharp-0.2.tar.gz (400.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page