Skip to main content

Modernized fork of jieba_fast, with python 3.9+ support and Cython speedups.

Project description

jieba-next

PyPI version PyPI - Python Version GitHub Actions Workflow Status PyPI - Downloads License

jieba-nextjieba_fast 的一个现代化分支,旨在提供对 Python 3.9+ 的支持,并利用 Cython 进行了代码优化和加速。

jieba_fast 本身是经典中文分词库 jieba 的一个 CPython 加速版本。本项目在 jieba_fast 的基础上,更新了构建系统,并用 Cython 重新实现了部分核心算法,解决了内存泄漏问题,并提升了可维护性。

项目特点

  • 现代化:支持 Python 3.9 及更高版本,不再支持 Python 2。
  • 性能:利用 Cython 重新实现了生成 DAG(有向无环图)及计算最优路径的算法,以提升分词速度。
  • 兼容性:力求与原版 jiebajieba_fast 的分词结果保持一致。
  • 易于安装:使用现代化的构建工具,提供多平台的预编译二进制包(wheels),简化安装过程。
  • 易于使用:可以作为 jieba 的直接替代品,只需 import jieba_next as jieba

当前状态

本项目目前处于早期开发阶段:

  • 已完成基础功能测试,可以正确执行分词任务。
  • 与原 jieba_fast 仓库的分词结果具有一致性。
  • 性能方面略低于原 jieba_fast 仓库,但仍远强于原版 jieba,后续将持续进行优化。
  • 测试覆盖尚不完整,欢迎贡献测试用例。

安装

您可以通过 pip 从 PyPI 安装:

pip install jieba-next

或者从源码安装:

git clone https://github.com/mxcoras/jieba-next.git
cd jieba-next
pip install .

使用示例

可以像使用 jiebajieba_fast 一样使用 jieba-next

import jieba_next as jieba

text = "在输出层后再增加CRF层,加强了文本间信息的相关性,针对序列标注问题,每个句子的每个词都有一个标注结果,对句子中第i个词进行高维特征的抽取,通过学习特征到标注结果的映射,可以得到特征到任意标签的概率,通过这些概率,得到最优序列结果"

print("-".join(jieba.lcut(text, HMM=True)))
print('-'.join(jieba.lcut(text, HMM=False)))

输出:

在-输出-层后-再-增加-CRF-层-,-加强-了-文本-间-信息-的-相关性-,-针对-序列-标注-问题-,-每个-句子-的-每个-词-都-有-一个-标注-结果-,-对-句子-中-第-i-个-词-进行-高维-特征-的-抽取-,-通过-学习-特征-到-标注-结果-的-映射-,-可以-得到-特征-到-任意-标签-的-概率-,-通过-这些-概率-,-得到-最优-序列-结果
在-输出-层-后-再-增加-CRF-层-,-加强-了-文本-间-信息-的-相关性-,-针对-序列-标注-问题-,-每个-句子-的-每个-词-都-有-一个-标注-结果-,-对-句子-中-第-i-个-词-进行-高维-特征-的-抽取-,-通过-学习-特征-到-标注-结果-的-映射-,-可以-得到-特征-到-任意-标签-的-概率-,-通过-这些-概率-,-得到-最优-序列-结果

算法

  • 基于前缀词典实现高效的词图扫描,生成句子中汉字所有可能成词情况所构成的有向无环图 (DAG)。
  • 采用动态规划查找最大概率路径, 找出基于词频的最大切分组合。
  • 对于未登录词,采用了基于汉字成词能力的 HMM 模型,并使用了 Viterbi 算法。

鸣谢

"结巴"中文分词原作者: SunJunyi
jieba_fast 仓库作者: deepcs233

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

jieba_next-1.0.0a3-cp313-cp313-win_amd64.whl (5.4 MB view details)

Uploaded CPython 3.13Windows x86-64

jieba_next-1.0.0a3-cp313-cp313-win32.whl (5.4 MB view details)

Uploaded CPython 3.13Windows x86

jieba_next-1.0.0a3-cp312-cp312-win_amd64.whl (5.4 MB view details)

Uploaded CPython 3.12Windows x86-64

jieba_next-1.0.0a3-cp312-cp312-win32.whl (5.4 MB view details)

Uploaded CPython 3.12Windows x86

jieba_next-1.0.0a3-cp311-cp311-win_amd64.whl (5.4 MB view details)

Uploaded CPython 3.11Windows x86-64

jieba_next-1.0.0a3-cp311-cp311-win32.whl (5.4 MB view details)

Uploaded CPython 3.11Windows x86

jieba_next-1.0.0a3-cp310-cp310-win_amd64.whl (5.4 MB view details)

Uploaded CPython 3.10Windows x86-64

jieba_next-1.0.0a3-cp310-cp310-win32.whl (5.4 MB view details)

Uploaded CPython 3.10Windows x86

File details

Details for the file jieba_next-1.0.0a3-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for jieba_next-1.0.0a3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 bab7465adb93a4a4ea90897af2d34e790cd0c6e6ec19334f786d0e531d8c8247
MD5 44a83e007554d7b1d49b0e3b257d0267
BLAKE2b-256 3d5fcae98be55bc230ec9f23de2a4e411ed6c29fc8e3aca1b1f84d6817513c31

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp313-cp313-win32.whl.

File metadata

  • Download URL: jieba_next-1.0.0a3-cp313-cp313-win32.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jieba_next-1.0.0a3-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 43fd37a0f2283742b7330ce0e523e71b6c898297444e529868148135f2a9a446
MD5 167b284a577385a8d12e0a1c7cc4b619
BLAKE2b-256 1f8fef2fc39907e4ce2f1274ab240587c35e7b062bff7175f7e062647566c6e4

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for jieba_next-1.0.0a3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 55ae1a55b027cf8bfc0fc525e8d7e6de92851d1dcccffd64c7f3ccd4db160f60
MD5 31088519a2505ac2ea948259e9dca09a
BLAKE2b-256 1743cb00410c3b2bffe483ec3b8b0898341bcc58b67e422b2740cb560d277d2e

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp312-cp312-win32.whl.

File metadata

  • Download URL: jieba_next-1.0.0a3-cp312-cp312-win32.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jieba_next-1.0.0a3-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 3669455f7e21a01c2f139ddb358618a599ad75ff3803928ce883c8433a635c57
MD5 df52f8f1a151a3d25969a560e8973a74
BLAKE2b-256 71bb495e5fa2eaa0b7ea48eb179cd13a8d7645b22659c9e890caf3350f45a694

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for jieba_next-1.0.0a3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 af95ba25daf3433cc6af30e09186492c8e455ebe18a4c7837fdab86e649536d7
MD5 18099ae113652c71ab428fd523b3cd22
BLAKE2b-256 7f47bcf2581036dc5f973da58d3d55cf686c4bb89d1cf06dc552490d9c845882

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp311-cp311-win32.whl.

File metadata

  • Download URL: jieba_next-1.0.0a3-cp311-cp311-win32.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jieba_next-1.0.0a3-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 3ad93ba3f921df1e066b7d3823d29709007b401f9d1c6ee447bf09daa98754ca
MD5 8c15171a64360444a5599453e2d37d00
BLAKE2b-256 2e5e7551e7af1c35584c68d0196451e9848f65a88e45e6517d3ca5314bce33f9

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for jieba_next-1.0.0a3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 205e34a59365fa1aaaaf7436f423d43d5802f9c8e79ab89ccc019303c41c1757
MD5 b8712d16e013d0d71282339bf4c7dd7e
BLAKE2b-256 c73be2ebb7bd0c490090b5e04810365abf83b6d6a4e16a85fc0a2eb5f927d97f

See more details on using hashes here.

File details

Details for the file jieba_next-1.0.0a3-cp310-cp310-win32.whl.

File metadata

  • Download URL: jieba_next-1.0.0a3-cp310-cp310-win32.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jieba_next-1.0.0a3-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 389086ed55b32fa56c1d7fc8f1b445986a112f8d1cc0c662c80ee92d6216e10f
MD5 1a0f9879a174a7d59b77a3d85dd242d9
BLAKE2b-256 a4fc10576907d3d43ec87360e92205e4f331385d06968c065bb6e88dded0ad74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page