Skip to main content

Chinese Words Segementations

Project description

cwsharp
========

Python中文分词库, 支持自定义词典和多种分词算法.

GitHub: https://github.com/zhengchun/cwsharp-python


特点
========

- 支持多种分词算法:

- MMSegTokenizer - 基于字典的分词算法,默认的分词算法.

- BigramTokenizer - 二元分词,支持英文、数字.

- 自定义字典, 支持中英文混合.

- 兼容Python 2x, 3x.

- MIT协议


安装说明
========

- 自动安装: ``easy_install cwsharp`` 或者 ``pipe install cwsharp``, ``pip3 install cwsharp``

Changelog
========

- [2017-11-10]

- MMSegTokenizer的分词性能提高20X。

- 修正chunk.degree()函数中word.freq为0的异常。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwsharp-0.2.tar.gz (400.8 kB view details)

Uploaded Source

File details

Details for the file cwsharp-0.2.tar.gz.

File metadata

  • Download URL: cwsharp-0.2.tar.gz
  • Upload date:
  • Size: 400.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cwsharp-0.2.tar.gz
Algorithm Hash digest
SHA256 a6baa39f50f376ce835abe56473ad7014f94a9441b8755e3b88bc94c9d6c2ea0
MD5 e5d065cdb4c9de4625ec8a42b29a5510
BLAKE2b-256 51145a7303c407cdbbf536548ed7cf356282d63cc24f396022a758415e7e03ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page