Skip to main content

jiojio: a convenient Chinese word segmentation tool

Project description

jiojio

- 基于CPU的高性能、持续迭代模型、简便中文分词器

a convenient Chinese word segmentation tool

<a alt="License">

    <img src="https://img.shields.io/github/license/dongrixinyu/jiojio?color=crimson" /></a>

<a alt="Size">

    <img src="https://img.shields.io/badge/size-82.1m-orange" /></a>

<a alt="Downloads">

    <img src="https://pepy.tech/badge/jiojio/month" /></a>

<a alt="Version">

    <img src="https://img.shields.io/badge/version-1.2.8-green" /></a>

<a href="https://github.com/dongrixinyu/jiojio/pulse" alt="Activity">

    <img src="https://img.shields.io/github/commit-activity/m/dongrixinyu/jiojio?color=blue" /></a>

适用场景

  • 基于 CPU高性能持续优化 中文分词器。

功能

  • 基于 C 的 Python 接口分词器,CPU 单进程运行性能达 13.4 万字/秒多个分词工具性能对比

  • 网页版 JioNLP源站,可快速试用分词、词性标注功能

  • 基于 CRF 算法,精细优化的 字符特征工程模型特征说明

  • 对模型文件的尽力压缩,使用 np.float8 精度类型,500万特征参数,模型文件大小30M,方便 pip 安装

  • 添加自定义词典兼容静态、动态两种方式,流程一致性强,词典配置说明

  • 将规则加入模型,有效克服某些类型文本受限于模型处理的情况,分词-添加正则

  • 支持词性标注功能,与 JioNLP 联合实现关键短语抽取新闻地域识别 等功能

安装

  • pip 方式(稳定版本)

$ pip install jiojio

  • Git 方式(开发版本)

$ git clone https://github.com/dongrixinyu/jiojio

$ cd jiojio

$ pip install .

  • 非 ubuntu 环境的 C 安装

如使用 windows 或 mac 等操作系统或其它硬件,则没有直接可调用 C 的库,程序默认直接调用纯 Python 进行分词,因此速度会慢。可以使用以下方式安装编译 C 库。以下方式仅供参考,在熟悉 C 语言后进行调试使用。关于windows平台的编译说明


$ git clone https://github.com/dongrixinyu/jiojio

$ cd jiojio/jiojio/jiojio_cpp

$ ./compiler.sh

使用

  • 基础方式

>>> import jiojio

>>> jiojio.init()

>>> print(jiojio.cut('开源软件应秉持全人类共享的精神,搞封闭式是行不通的。'))



# ['开源', '软件', '应', '秉持', '全人类', '共享', '的', '精神', ',', '搞', '封闭式', '是', '行', '不通', '的', '。']

# 可通过 jiojio.help() 获取基本使用方式说明

# 可通过 print(jiojio.init.__doc__) 获取模型初始化的各类参数

关于 jiojio 分词器的一些问答

  • 可能早十年把这个分词器写出来,jiojio 也许现在就会流行起来。在 ChatGPT 称霸 NLP 界的今天,我写这个工具,加速这个工具,纯粹是为了提升一下 C 语言的编程能力。ChatGPT 能够做出来,还是需要理想主义的,我写这个工具同理。

  • 与jiojio有关的问答

TODO

  • 对分词器效果做标注数据更新,模型长期优化

交流群聊

  • 欢迎加入自然语言处理NLP交流群,搜索wx公众号“JioNLP”,或扫以下码即可入群

image

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

jiojio-1.2.8-py2.py3-none-any.whl (85.5 MB view details)

Uploaded Python 2Python 3

jiojio-1.2.8-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.7 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.5+ x86-64

jiojio-1.2.8-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (85.7 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.5+ x86-64

File details

Details for the file jiojio-1.2.8-py2.py3-none-any.whl.

File metadata

  • Download URL: jiojio-1.2.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 85.5 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for jiojio-1.2.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 866053b58b4fe0ac97ccac4a698e169b078c41fa9e53b25ab3f5cf9280c890d7
MD5 cb14846b5978721b8c51382488e37b9c
BLAKE2b-256 03769a120c69a2f6d990a132be0590ad38a665b59bfaf8ff19b1c4a568abc9c6

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d02dab901d90810e9981e3ab0454bf97a9d559d7d5f8c343cba87c5ae42c7d9b
MD5 b6f563c11eb90988d174a56f32f26d43
BLAKE2b-256 c855ffe8f2d6f6341137c9d0e0a40daca695fe91d40027b98c08e7beeca71806

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5e1aa531590514642fefb02de0d6219d4d241385453608c64019619d3a272f72
MD5 1bb3a7ca385dd35fdc3db81ee0208ce9
BLAKE2b-256 068a4892cc39b580b5ea62afc018cea993d475a3ef52ab01029c9063bb6bbd69

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5f82f897d035dcea910433d645b4a959834124bccb020a1c834b323e48c09654
MD5 10d3f4ad58b80a8b7a7b4cf35d49eb45
BLAKE2b-256 79db23b77418b367815829b0f14c5b4bcb5dba3b289bd3df4063b523775088d1

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d177d9bfd4f5b74e47e7caea9ab6dfd6402154d158d7faffb2dcccf33feea914
MD5 64b754f08ff2123e33e9faad92d70f80
BLAKE2b-256 3562ddca4cb27dd34df5517cfe09105a8bc61bcc2730dda586d369660111affb

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 796e600ea731778e98d866379fd1e862b6c7814d35e40874acd60ef2ca706890
MD5 6c6a2519ee34d490771f42d8b44dfa85
BLAKE2b-256 11b363239d6007cfdc78cb4513325a7281148101fa0e147c07e4f6f8792cc1ae

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 06f8882218aa14436886853cccadb6dacb4ed7c51ccacc2eda4c79939b1553e5
MD5 ebb96c23bfdbc458848aa08f99f1befc
BLAKE2b-256 51de3718941360008607302b41917a6481649501213bc7ec15a3d042b4b39eec

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9987eef8116def112367b0ede1563317bf1de19c743b6795c9c975289fd9cf49
MD5 95397b816cb73b31d49324c8cc2b2d8d
BLAKE2b-256 7fa97013531fb4f734931c15847510955cbfb8bcd935c834af2a278b61fc90f7

See more details on using hashes here.

File details

Details for the file jiojio-1.2.8-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for jiojio-1.2.8-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2aa2f7974ade694e069c8a37e79acbe67ac026ad6e388be7efa8e10369e5e5b6
MD5 e55bf098deee9ef4a196c6d6584908e7
BLAKE2b-256 5756d7dfc9291ac6d8f9b12efef3f0e7c2288e0f818a4d99052f13386162f50a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page