Chinese Words Segmentation Utilities
Project description
jieba-py项目说明
这些年做的工作很多都涉及到中文分词,jieba分词几乎是每个项目必须安装的模块,一直工作非常稳定。 但是由于其代码库长期不再维护,一直是使用中的一个隐患,使用中做过简单的代码修改以去除一些警告。 两年前就有想对其源码进行修改更新的想法,但是由于对自然语言处理并不算了解,没法实际动手。
这次借助于AI工具,得以快速地对项目的各个模块进行了了解,发现项目还是比较简单的。
所以我 fork 了此模块, 实际动手做了一些修改。
合并了原来的一点工作,这中间也大量使用了AI工具。
整体工作量不算大,完成后使用了没有大的问题。目前最新版本已经发布于 pypi 。
这个版本是纯Python语言的继续维护,还是有一些意义的。 jieba分词其实被很多语言重新进行了实现, 其中不乏一些项目是可以在Python中直接使用的。
- 只保证能运行在 Python 3.10 以上环境,不再考虑 Python 2 的兼容性,简化了一些代码。
- 对代码进行了格式化,方便阅读。暂时不会进行较大的修改。
- 在程序结构上使用了更新的技术手段来保证工程的稳定开发。
- 单元测试进行逐步的修改。这部分不影响核心逻辑,修改影响不大。
- 打包方式进行了修改,使用了
pyproject.toml配置文件, 放弃原来的setup.py配置文件。 - 发布到
pypi时使用jieba-py命名。 目前与原jieba分词用法完全一样, 只需要安装jieba-py来代替jieba。 为了保持兼容性,使用了jieba模块名,安装时会与旧版本冲突。 如果安装了jieba模块,一定要先卸载jieba模块:python3 -m pip install jieba。 - 移除了对于
paddle模式的支持。 一方面,paddle模式已经不再维护, 另一方面,paddle模式在Python 3.10以上版本中无法运行。
NOTE
目前 jieba-py 模块已经发布到 pypi 上,请使用 pip 安装。
使用其他Python包管理安装方法请按各自使用方法进行调整。
最新版本为: 0.46.12
python3 -m pip install jieba-py
WARNING
项目的 ReadMe.md 文件自动生成、维护,请勿直接修改。
jieba-py Project Description
Much of the work I have done over the years has involved Chinese word segmentation. Jieba is a module that must be installed for almost every project and has always worked very stably. However, since its codebase has not been maintained for a long time, it has been a hidden concern. I have made simple code modifications to remove some warnings. Two years ago, I thought about modifying and updating its source code, but I couldn’t actually do it because I didn’t know enough about natural language processing.
这次借助于AI工具,得以快速地对项目的各个模块进行了了解,发现项目还是比较简单的。
所以我 fork 了此模块, 实际动手做了一些修改。
合并了原来的一点工作,这中间也大量使用了AI工具。
整体工作量不算大,完成后使用了没有大的问题。目前最新版本已经发布于 pypi 。
This version’s continued maintenance in pure Python still has some significance.Jieba segmentation has actually been re-implemented in many languages. There are many projects among them that can be used directly in Python.
- It is only guaranteed to run in environments with Python 3.10 or above. Python 2 compatibility is no longer considered, and some code has been simplified.
- The code has been formatted for easier reading. No major changes will be made for the time being.
- Updated technical means have been used in the program structure to ensure the stable development of the project.
- Unit tests are being gradually modified. This part does not affect the core logic, and the impact of these modifications is minor.
- The packaging method has been modified to use the
pyproject.tomlconfiguration file, abandoning the originalsetup.pyconfiguration file. - When published to
pypi, it uses the namejieba-py. Currently, the usage is exactly the same as the originaljieba. You only need to installjieba-pyto replacejieba. To maintain compatibility, thejiebamodule name is used, which will conflict with the old version during installation. If thejiebamodule is installed, be sure to uninstall it first:python3 -m pip uninstall jieba. - Support for
paddlemode has been removed. On one hand,paddlemode is no longer maintained. On the other hand,paddlemode cannot run in Python versions 3.10 and above.
NOTE
目前 jieba-py 模块已经发布到 pypi 上,请使用 pip 安装。
使用其他Python包管理安装方法请按各自使用方法进行调整。
最新版本为: 0.46.12
python3 -m pip install jieba-py
WARNING
项目的 ReadMe.md 文件自动生成、维护,请勿直接修改。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jieba_py-0.46.12.tar.gz.
File metadata
- Download URL: jieba_py-0.46.12.tar.gz
- Upload date:
- Size: 5.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c74a4ecf7e579c00915feeefd40e54450c1616ddfa18497cb0d42563f7e4eb97
|
|
| MD5 |
0a8836ab8b1f97a12ed3c13bb8faa930
|
|
| BLAKE2b-256 |
6a03f940a9f8fc2441ef1175369c01726add656dc55fea9edf1421733467050d
|
File details
Details for the file jieba_py-0.46.12-py3-none-any.whl.
File metadata
- Download URL: jieba_py-0.46.12-py3-none-any.whl
- Upload date:
- Size: 5.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4f616b571cbb2f3668608973bcd882a04e2faa94b95338acbbddf3cbe111930
|
|
| MD5 |
78253e68e8220f6011478108a9d97feb
|
|
| BLAKE2b-256 |
f326b7bbcedc1bfa49fe44d71233819deb95026f13ed10936a175c6b2f8a0f05
|