jionlp

Chinese NLP Preprocessing & Parsing

Project description

<a alt="jionlp logo">

    <img src="../../blob/master/image/jionlp_logo.jpg" style="width:300px;height:100px">

</a>

<a alt="License">

    <img src="https://img.shields.io/github/license/dongrixinyu/JioNLP?color=crimson" /></a>

<a alt="Size">

    <img src="https://img.shields.io/badge/size-15.6m-orange" /></a>

<a alt="Downloads">

    <img src="https://pepy.tech/badge/jionlp/month" /></a>

<a alt="Version">

    <img src="https://img.shields.io/badge/version-1.5.29-green" /></a>

<a href="https://github.com/dongrixinyu/JioNLP/pulse" alt="Activity">

    <img src="https://img.shields.io/github/commit-activity/m/dongrixinyu/JioNLP?color=blue" /></a>

JioNLP：中文 NLP 预处理、解析工具包 A Python Lib for Chinese NLP Preprocessing & Parsing

安装：`pip install jionlp`

JioNLP 是一个面向 NLP 开发者的工具包，提供 NLP 任务预处理、解析功能，准确、高效、零使用门槛。请下拉本网页，查阅具体功能信息，并按 Ctrl+F 进行搜索。JioNLP在线版 可快速试用部分功能。关注同名微信公众号 JioNLP 可获取最新的 AI 资讯，数据资源。

时间语义解析是目前较多开发者使用的功能，如果您考虑定制化效果更好的版本，可以与我联系，微信号 shanzhuiyancheng

2025-02-22 更新大语言模型 LLM 评测数据集

JioNLP 提供了一套 LLM 的测试数据集，并应用 MELLM 算法完成了自动评测。
评测结果可关注公众号JioNLP，查阅具体各家评测截图 pdf。


>>> import jionlp as jio

>>> llm_test = jio.llm_test_dataset_loader(version='1.2')

>>> print(llm_test[15])

>>> llm_test = jio.llm_test_dataset_loader(field='math')

>>> print(llm_test[5])

2025-04-01 更新函数，删除了一部分词典内容

jio.chinese_idiom_loader

该函数是成语加载函数，目前返回成语的释义、出处、示例、整个中文语料中的出现频率。

由于该函数占据了 2.9M 硬盘空间，且使用人数应该非常少，所以会对该词典进行精简，计划仅保留成语以及其文本频率，删除释义、出处、示例。

这样做会压缩 jionlp 工具包大小。

2023-12-12 Add MELLM

MELLM, short for Mutual Evaluation of Large Language Models, is an automatic evaluation algorithm of LLMs without human supervision. MELLM has been tested effectively on several LLMs and datasets test results and analysis. You can use the example code below to take a try.
before running this code, you should download norm_score.json and max_score.json from test data with password jmbo.
If you encounter any error, read the test_mellm.py to download *.json file.


$ git clone https://github.com/dongrixinyu/JioNLP

$ cd JioNLP/test/

$ python test_mellm.py

安装 Installation

python>=3.6 github 版本略领先于 pip


$ git clone https://github.com/dongrixinyu/JioNLP

$ cd ./JioNLP

$ pip install .

pip 安装


$ pip install jionlp

使用 Features

导入工具包，查看工具包的主要功能与函数注释


>>> import jionlp as jio

>>> print(jio.__version__)  # 查看 jionlp 的版本

>>> dir(jio)

>>> print(jio.extract_parentheses.__doc__)

星级⭐代表优质特色功能

1.小工具集 Gadgets

| 功能 | 函数 |描述 |星级 |

|--------|-------|-------|-------|

|分句 |split_sentence|对文本按标点分句 |⭐|

|电话号码归属地、
运营商解析 |phone_location
cell_phone_location
landline_phone_location |给定一个电话号码（手机号、座机号）字符串，识别其中的省、市、运营商 ||

|公历农历日期互转|lunar2solar
solar2lunar |给定某公（农）历日期，将其转换为农（公）历 ||

|成语接龙 |idiom_solitaire|成语接龙，即前一成语的尾字和后一成语的首字（读音）相同 ||

|色情数据过滤 |- |- |

|反动数据过滤 |- |- |

2.数据增强

文本数据增强各方法说明

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|------|

|回译 |BackTranslation|给定一篇文本，采用各大厂云平台的机器翻译接口，
实现数据增强 |⭐ |

3.正则抽取与解析

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|-------|

|抽取括号中的内容 |extract_parentheses|抽取括号内容，包括 {}「」[]【】()（）<>《》 |⭐ |

|删除括号中的内容 |remove_parentheses|删除括号内容，包括 {}「」[]【】()（）<>《》 | |

4.文件读写工具

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|-------|

|按行读取文件 |read_file_by_iter |以迭代器形式方便按行读取文件，节省内存，
支持指定行数，跳过空行 ||

5.词典加载与使用

| 功能 | 函数 | 描述 |星级 |

|-----|-----|------|------|

6.实体识别(NER)算法辅助工具集

工具包 NER 数据规定说明

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|-------|

7.文本分类

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|------|

8.情感分析

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|-------|

9.分词

| 功能 | 函数 |描述 |星级 |

|--------|--------|-------|-------|

文献引用

若论文需要进行引用，可复制以下引用：

Chengyu Cui, JioNLP, (2020), GitHub repository, https://github.com/dongrixinyu/JioNLP

初衷

NLP 预处理与解析至关重要，且非常耗时。本 lib 能快速辅助完成各种琐碎的预处理、解析操作，加速开发进度，把有限的精力用在思考而非 code 上。
如有功能建议、bug，可通过 issue 按模板提出。
非常欢迎各位 NLP 开发者和研究者 合作完善本工具包，添加新功能 。

如本工具对您有帮助，请点一下右上角 star ⭐

或者扫码请作者喝杯咖啡 (●'◡'●)，开源项目完全用爱发电，谢谢啦！推荐优先使用【支付宝】 ~~

感谢致谢名单中赞助的小伙伴们，你们的打赏让我更有动力

<a alt="jionlp logo">

    <img src="../../blob/master/image/payment_code.jpg" style="width:500px;height:380px">

</a>

做 NLP不易，欢迎加入自然语言处理 Wechat 交流群

请扫以下码，或wx搜索公众号JioNLP”，关注并回复【进群】

<a alt="jionlp logo">

    <img src="../../blob/master/image/qrcode_for_gh.jpg" style="width:200px;height:200px">

</a>

Project details

Release history Release notifications | RSS feed

This version

1.5.29

Jun 5, 2026

1.5.28

Apr 25, 2026

1.5.27

Oct 30, 2025

1.5.26

Oct 9, 2025

1.5.25

Sep 8, 2025

1.5.24

Jul 9, 2025

1.5.23

May 15, 2025

1.5.22

Apr 23, 2025

1.5.20

Mar 22, 2025

1.5.19

Jan 9, 2025

1.5.17

Sep 26, 2024

1.5.15

Jul 5, 2024

1.5.14

May 22, 2024

1.5.11

Apr 26, 2024

1.5.9

Apr 1, 2024

1.5.7

Feb 5, 2024

1.5.6

Dec 12, 2023

1.5.5

Nov 14, 2023

1.5.4

Oct 13, 2023

1.5.2

Jul 26, 2023

1.4.41

Jun 12, 2023

1.4.40

May 11, 2023

1.4.39

May 1, 2023

1.4.38

Apr 29, 2023

1.4.35

Feb 20, 2023

1.4.33

Jan 16, 2023

1.4.30

Dec 28, 2022

1.4.28

Dec 2, 2022

1.4.27

Nov 28, 2022

1.4.25

Nov 7, 2022

1.4.21

Sep 28, 2022

1.4.19

Sep 19, 2022

1.4.18

Sep 3, 2022

1.4.17

Aug 16, 2022

1.4.14

Jul 24, 2022

1.4.7

Jun 17, 2022

1.3.58

May 13, 2022

1.3.53

Mar 9, 2022

1.3.47

Dec 29, 2021

1.3.34

Sep 15, 2021

1.3.27

Aug 4, 2021

1.3.16

Mar 18, 2021

1.3.14

Feb 10, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jionlp-1.5.29-py2.py3-none-any.whl (16.5 MB view details)

Uploaded Jun 5, 2026 Python 2Python 3

File details

Details for the file jionlp-1.5.29-py2.py3-none-any.whl.

File metadata

Download URL: jionlp-1.5.29-py2.py3-none-any.whl
Upload date: Jun 5, 2026
Size: 16.5 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for jionlp-1.5.29-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`28ec74bbec6555b9651c45d195a9d5ea5455d7891502c9a33837b64c7e0f9340`
MD5	`47f34aaacf1e6fe4a4fa91fe28bdb9ca`
BLAKE2b-256	`e29e6ef4fe0dbd34e16bf8de7d3961d47ccc916bd03cf34365d58ade308e894f`

See more details on using hashes here.

jionlp 1.5.29

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

JioNLP：中文 NLP 预处理、解析工具包 A Python Lib for Chinese NLP Preprocessing & Parsing

安装：pip install jionlp

时间语义解析是目前较多开发者使用的功能，如果您考虑定制化效果更好的版本，可以与我联系，微信号 shanzhuiyancheng

2025-02-22 更新大语言模型 LLM 评测数据集

2025-04-01 更新函数，删除了一部分词典内容

2023-12-12 Add MELLM

安装 Installation

使用 Features

1.小工具集 Gadgets

2.数据增强

3.正则抽取与解析

4.文件读写工具

5.词典加载与使用

6.实体识别(NER)算法辅助工具集

7.文本分类

8.情感分析

9.分词

文献引用

初衷

如本工具对您有帮助，请点一下右上角 star ⭐

或者扫码请作者喝杯咖啡 (●'◡'●)，开源项目完全用爱发电，谢谢啦！推荐优先使用【支付宝】 ~~

做 NLP不易，欢迎加入自然语言处理 Wechat 交流群

请扫以下码，或wx搜索公众号JioNLP”，关注并回复【进群】

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

安装：`pip install jionlp`