Skip to main content

Traditional Chinese Words Segementation Utilities

Project description

jieba

“结巴”中文分词:做最好的 Python 中文分词组件

“Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module.

完整文档见 README.md

GitHub: https://github.com/fxsjy/jieba

特点

  • 支持三种分词模式:

    • 精确模式,试图将句子最精确地切开,适合文本分析;

    • 全模式,把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;

    • 搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。

  • 支持繁体分词

  • 支持自定义词典

  • MIT 授权协议

在线演示: http://jiebademo.ap01.aws.af.cm/

安装说明

代码对 Python 2/3 均兼容

  • 全自动安装: easy_install jieba 或者 pip install jieba / pip3 install jieba

  • 半自动安装:先下载 https://pypi.python.org/pypi/jieba/ ,解压后运行 python setup.py install

  • 手动安装:将 jieba 目录放置于当前目录或者 site-packages 目录

  • 通过 import jieba 来引用

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jieba_hant-0.39.1.tar.gz (6.5 MB view details)

Uploaded Source

File details

Details for the file jieba_hant-0.39.1.tar.gz.

File metadata

  • Download URL: jieba_hant-0.39.1.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for jieba_hant-0.39.1.tar.gz
Algorithm Hash digest
SHA256 d6d58a0fbccc3f77627c4cf29911ac8bbee359d11cde7b210a32f51ecb993ecf
MD5 dd3ca3c38fe232150e37d28eae1173fe
BLAKE2b-256 2647a7d2b8a749f93282e0c0a0d664575ba7cae35e63b19ecd3548885ecf88f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page