Skip to main content

jiojio: a convenient Chinese word segmentation tool

Project description

jiojio

- 基于CPU的高性能、持续迭代、简便中文分词器

a convenient Chinese word segmentation tool

<a alt="License">

    <img src="https://img.shields.io/github/license/dongrixinyu/jiojio?color=crimson" /></a>

<a alt="Size">

    <img src="https://img.shields.io/badge/size-30.1m-orange" /></a>

<a alt="Downloads">

    <img src="https://pepy.tech/badge/jiojio/month" /></a>

<a alt="Version">

    <img src="https://img.shields.io/badge/version-1.1.8-green" /></a>

<a href="https://github.com/dongrixinyu/jiojio/pulse" alt="Activity">

    <img src="https://img.shields.io/github/commit-activity/m/dongrixinyu/jiojio?color=blue" /></a>

适用场景

  • 基于 CPU高性能持续优化 中文分词器。

功能

  • 基于 C 优化的 Python 接口分词器,单进程运行性能达 5.2 万字/秒多个分词工具性能对比

  • 网页版 JioNLP源站,可快速试用分词、词性标注功能

  • 基于 CRF 算法,精细优化的 字符特征工程模型特征说明

  • 对模型文件的尽力压缩,500万特征参数,模型文件大小30M,方便 pip 安装

  • 将词典加入模型,共同预测分词序列,流程一致性强,词典配置说明

  • 将规则加入模型,有效克服某些类型文本受限于模型处理的情况,分词-添加正则

  • 支持词性标注功能,与 JioNLP 联合实现关键短语抽取新闻地域识别 等功能

安装

  • pip 方式(稳定版本)

$ pip install jiojio

  • Git 方式(开发版本)

$ git clone https://github.com/dongrixinyu/jiojio

$ cd jiojio

$ pip install .

使用

  • 基础方式

>>> import jiojio

>>> jiojio.init()

>>> print(jiojio.cut('我爱北京天安门!'))



# ['我', '爱', '北京', '天安门', '!']

# 可通过 jiojio.help() 获取基本使用方式说明

# 可通过 print(jiojio.init.__doc__) 获取模型初始化的各类参数

关于 jiojio 分词器的一些问答

TODO

  • 对分词器效果做长期优化

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

jiojio-1.1.8-py2.py3-none-any.whl (32.8 MB view hashes)

Uploaded Python 2 Python 3

jiojio-1.1.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (32.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

jiojio-1.1.8-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (32.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

jiojio-1.1.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (32.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

jiojio-1.1.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (32.9 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

jiojio-1.1.8-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (32.9 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.5+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page