Skip to main content

Chinese Words Segmentation Utilities

Project description

jieba-py项目说明

这些年做的工作很多都涉及到中文分词,jieba分词几乎是每个项目必须安装的模块,一直工作非常稳定。 但是由于其代码库长期不再维护,一直是使用中的一个隐患,使用中做过简单的代码修改以去除一些警告。 两年前就有想对其源码进行修改更新的想法,但是由于对自然语言处理并不算了解,没法实际动手。

这次借助于AI工具,得以快速地对项目的各个模块进行了了解,发现项目还是比较简单的。 所以我 fork 了此模块, 实际动手做了一些修改。 合并了原来的一点工作,这中间也大量使用了AI工具。 整体工作量不算大,完成后使用了没有大的问题。目前最新版本已经发布于 pypi 。

这个版本是纯Python语言的继续维护,还是有一些意义的。 jieba分词其实被很多语言重新进行了实现, 其中不乏一些项目是可以在Python中直接使用的。

  • 只保证能运行在 Python 3.10 以上环境,不再考虑 Python 2 的兼容性,简化了一些代码。
  • 对代码进行了格式化,方便阅读。暂时不会进行较大的修改。
  • 在程序结构上使用了更新的技术手段来保证工程的稳定开发。
  • 单元测试进行逐步的修改。这部分不影响核心逻辑,修改影响不大。
  • 打包方式进行了修改,使用了 pyproject.toml 配置文件, 放弃原来的 setup.py 配置文件。
  • 发布到 pypi 时使用 jieba-py 命名。 目前与原 jieba 分词用法完全一样, 只需要安装 jieba-py 来代替 jieba 。 为了保持兼容性,使用了 jieba 模块名,安装时会与旧版本冲突。 如果安装了 jieba 模块,一定要先卸载 jieba 模块: python3 -m pip install jieba
  • 移除了对于 paddle 模式的支持。 一方面,paddle 模式已经不再维护, 另一方面,paddle 模式在Python 3.10以上版本中无法运行。

NOTE

目前 jieba-py 模块已经发布到 pypi 上,请使用 pip 安装。 使用其他Python包管理安装方法请按各自使用方法进行调整。 最新版本为: 0.46.12

python3 -m pip install jieba-py

WARNING

项目的 ReadMe.md 文件自动生成、维护,请勿直接修改。

jieba-py Project Description

Much of the work I have done over the years has involved Chinese word segmentation. Jieba is a module that must be installed for almost every project and has always worked very stably. However, since its codebase has not been maintained for a long time, it has been a hidden concern. I have made simple code modifications to remove some warnings. Two years ago, I thought about modifying and updating its source code, but I couldn’t actually do it because I didn’t know enough about natural language processing.

这次借助于AI工具,得以快速地对项目的各个模块进行了了解,发现项目还是比较简单的。 所以我 fork 了此模块, 实际动手做了一些修改。 合并了原来的一点工作,这中间也大量使用了AI工具。 整体工作量不算大,完成后使用了没有大的问题。目前最新版本已经发布于 pypi 。

This version’s continued maintenance in pure Python still has some significance.Jieba segmentation has actually been re-implemented in many languages. There are many projects among them that can be used directly in Python.

  • It is only guaranteed to run in environments with Python 3.10 or above. Python 2 compatibility is no longer considered, and some code has been simplified.
  • The code has been formatted for easier reading. No major changes will be made for the time being.
  • Updated technical means have been used in the program structure to ensure the stable development of the project.
  • Unit tests are being gradually modified. This part does not affect the core logic, and the impact of these modifications is minor.
  • The packaging method has been modified to use the pyproject.toml configuration file, abandoning the original setup.py configuration file.
  • When published to pypi, it uses the name jieba-py. Currently, the usage is exactly the same as the original jieba. You only need to install jieba-py to replace jieba. To maintain compatibility, the jieba module name is used, which will conflict with the old version during installation. If the jieba module is installed, be sure to uninstall it first: python3 -m pip uninstall jieba.
  • Support for paddle mode has been removed. On one hand, paddle mode is no longer maintained. On the other hand, paddle mode cannot run in Python versions 3.10 and above.

NOTE

目前 jieba-py 模块已经发布到 pypi 上,请使用 pip 安装。 使用其他Python包管理安装方法请按各自使用方法进行调整。 最新版本为: 0.46.12

python3 -m pip install jieba-py

WARNING

项目的 ReadMe.md 文件自动生成、维护,请勿直接修改。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jieba_py-0.46.12.tar.gz (5.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jieba_py-0.46.12-py3-none-any.whl (5.4 MB view details)

Uploaded Python 3

File details

Details for the file jieba_py-0.46.12.tar.gz.

File metadata

  • Download URL: jieba_py-0.46.12.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jieba_py-0.46.12.tar.gz
Algorithm Hash digest
SHA256 c74a4ecf7e579c00915feeefd40e54450c1616ddfa18497cb0d42563f7e4eb97
MD5 0a8836ab8b1f97a12ed3c13bb8faa930
BLAKE2b-256 6a03f940a9f8fc2441ef1175369c01726add656dc55fea9edf1421733467050d

See more details on using hashes here.

File details

Details for the file jieba_py-0.46.12-py3-none-any.whl.

File metadata

  • Download URL: jieba_py-0.46.12-py3-none-any.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jieba_py-0.46.12-py3-none-any.whl
Algorithm Hash digest
SHA256 a4f616b571cbb2f3668608973bcd882a04e2faa94b95338acbbddf3cbe111930
MD5 78253e68e8220f6011478108a9d97feb
BLAKE2b-256 f326b7bbcedc1bfa49fe44d71233819deb95026f13ed10936a175c6b2f8a0f05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page