Skip to main content

Rust Extension For Language Technology Platform(Python).

Project description

Language version
Python LTP LTP-Core LTP-Extension
Rust LTP

LTP extension For Python

LTP for Rust 对 Python 的绑定,用于提升 LTP 的速度,以及加入传统机器学习算法实现的中文信息处理工具。

method ltp 3.0(c++) ap(1) ap(8) pa pa-i(0.5) pa-ii(0.5)
cws 97.83 97.93 97.67 97.90 97.90 97.93
pos 98.35 98.41 98.30 98.39 98.39 98.38
ner 94.17 94.28 93.42 94.02 94.06 93.95

自行编译安装

maturin build --release -m python/extension/Cargo.toml --out dist --no-default-features --features="malloc"
# or 针对cpu优化
maturin build --release -m python/extension/Cargo.toml --out dist --no-default-features --features="malloc" -- -C target-cpu=native

features

  • 分句
  • 任务
    • 中文分词(cws)
      • 对数字、英文、网址、邮件的处理
      • 支持自定义词典
    • 词性标注(pos)
      • 支持自定义词典
    • 命名实体识别(ner)
  • 算法
    • 平均感知机(ap)
      • 单线程平均感知机
      • 多线程平均感知机
    • 被动攻击算法(pa)
  • 模型量化
  • 在线学习
  • 增量学习

性能测试

评测环境

  • Python 3.10
  • MacBook Pro (16-inch, 2019)
  • 处理器: 2.6 GHz 六核Intel Core i7
  • 内存: 16 GB 2667 MHz DDR4

注: 速度测试文件大小为 33.85 MB / 305041 行

分词

我们选择Jieba、Pkuseg、Thulac等国内代表分词软件与 LTP 做性能比较,根据第二届国际汉语分词测评发布的国际中文分词测评标准,对不同软件进行了速度和准确率测试。

在第二届国际汉语分词测评中,共有四家单位提供的测试语料(Academia Sinica、 City University 、Peking University(PKU) 、Microsoft Research(MSR)), 在评测提供的资源icwb2-data 中包含了来自这四家单位的训练集(icwb2-data/training)、测试集(icwb2-data/testing), 以及根据各自分词标准而提供的相应测试集的标准答案(icwb2-data/gold).在icwb2-data/scripts目录下含有对分词进行自动评分的perl脚本score。

我们在统一测试环境下,对若干流行分词软件和 LTP 进行了测试,使用的模型为各分词软件自带模型。在PKU和MSR测试集评测结果如下:

Algorithm Speed(KB/s) PKU(F1) MSR(F1)
Jieba 982.49 81.8 81.3
Pkuseg 109.72 93.4 87.3
Thulac 48.13 94.0 87.9
Thulac[Fast] 1133.21 同上 同上
LTP 3(pyltp) 451.20 95.3 88.3
LTP legacy(1) 1603.63 95.2 87.7
LTP legacy(2) 2869.42 同上 同上
LTP legacy(4) 4949.38 同上 同上
LTP legacy(8) 6803.88 同上 同上
LTP legacy(16) 7745.16 同上 同上

注:括号内为线程数量

注2:Jieba的词表是在人民日报数据集上统计的

流水线

除了分词以外,我们也测试了 LTP 三个任务(分词、词性标注、命名实体识别)流水线的速度:

Algorithm Speed(KB/s)
LTP 3(pyltp) 153.10
LTP legacy(1) 508.74
LTP legacy(2) 899.25
LTP legacy(4) 1598.03
LTP legacy(8) 2267.48
LTP legacy(16) 2452.34

注:括号内为线程数量

注2:速度数据在人民日报命名实体测试数据上获得,速度计算方式均为所有任务顺序执行的结果。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ltp_extension-0.1.13.tar.gz (109.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ltp_extension-0.1.13-cp37-abi3-win_amd64.whl (707.5 kB view details)

Uploaded CPython 3.7+Windows x86-64

ltp_extension-0.1.13-cp37-abi3-win32.whl (681.5 kB view details)

Uploaded CPython 3.7+Windows x86

ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ x86-64

ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.8 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ s390x

ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.4 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ppc64le

ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.4 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARMv7l

ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARM64

ltp_extension-0.1.13-cp37-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (1.4 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.12+ i686

ltp_extension-0.1.13-cp37-abi3-macosx_11_0_arm64.whl (804.8 kB view details)

Uploaded CPython 3.7+macOS 11.0+ ARM64

ltp_extension-0.1.13-cp37-abi3-macosx_10_12_x86_64.whl (923.1 kB view details)

Uploaded CPython 3.7+macOS 10.12+ x86-64

File details

Details for the file ltp_extension-0.1.13.tar.gz.

File metadata

  • Download URL: ltp_extension-0.1.13.tar.gz
  • Upload date:
  • Size: 109.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for ltp_extension-0.1.13.tar.gz
Algorithm Hash digest
SHA256 5fa2fdca6f96f1c1535ab259dd6e1ad6c427e2c11391750f4f8d30db55ff6401
MD5 5801787503e21bfe1f5fe35466c7a762
BLAKE2b-256 50c5a00118e7fd35733b12b3cc84b09c64f42ef2a1d7cf1cbad88d4aa33f7e55

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 15cdbc748647a7850ff241f516f81af1a3c41d36ab5e07da703cf0e53457f15b
MD5 4a3956f83e08858425c3844ac89ce79b
BLAKE2b-256 5e9990892503fe37128ad11d2397ce093a3605c36452cca8ef04be261027ea1e

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-win32.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 4b232cde6f3c65ffe8f80181ebaa55d71912a9e7ba527dda3cbc041a81202b68
MD5 cfd67fabad704b4fff6202324eb6c1d8
BLAKE2b-256 e8d4a98c303756776e1917c6ee05484641001c3c4d5eccc0eee9775a4f12bd43

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 221c504221218d64469038276974009abc85766cbd15fd7fa3a40d52796d3e67
MD5 0ad9d5a39178bfaef8898ceda4b9f1e6
BLAKE2b-256 9b28fbd3facff50ec9a284a2f39d9c5afde510327c08cdb7aead3f642e1871cd

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 c014ad836e28b69a8dc5f17d3603d18cdef4db70de294a246c0b40d6df9eafaf
MD5 81c283d11ba83f5f33422066920eba14
BLAKE2b-256 59f0a50af7bfeb9a428d570bd679344e8d254aa55d582ffaf3ecec8258278f3a

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 1e7cd25d7a5f810b93085a22306cd6f97eebe2c2c9b609e4b8ac3b2e028be783
MD5 7f48711e78cba267ae59ab0acc271bf7
BLAKE2b-256 c8258d4c0397c8f588db9bfa2dbe5ae0cf69efa24e5d3e18cd6b357c25b1f1bf

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 1fa88577fdda17596759b7eebf3aa24353982d3f51ae691a57c066669ade3688
MD5 a47f3ab229fe02857da912861a17a300
BLAKE2b-256 ea56dd8c33c1e542a7906b3223196502fe3a73a99a4030f6e819673c5e104abe

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b2d46cb45b10420b828ac9d3532fb736fadfe484d4d2647c933697e1a48b0a1a
MD5 a65eec7f1321aa4f0d835065199e5d42
BLAKE2b-256 97af56058f9303d7c5c14abbbe6694b64d682d9f4f1b52a85444d7b9d3196be1

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 ccd8387a46675b776a9b537935446cbed982afad3a632dc096f0a7bd45b3bb6b
MD5 3f9369dc255256e22e6aa4d8e8e803db
BLAKE2b-256 b973dcc246b313375c44024bdd2cf85f3e06d01d1823c1433ffa4504c3a1d9e3

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2add33859c825118a5b75a96967cbd464fef5fa24b2921bb3ed6fe129bcd12e
MD5 ced733f73253e9c18e85e97af136ac4d
BLAKE2b-256 1248c3846016678b4d7f5b4822e40594d0303404934f42f05229afc80713c092

See more details on using hashes here.

File details

Details for the file ltp_extension-0.1.13-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for ltp_extension-0.1.13-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9630a02efc2c1bdb4999a0d90b9aae6d544adc85af0d8844126a26a461db3f67
MD5 3846123a9bec31ea7363581f417a9904
BLAKE2b-256 0263a389f2b47e60890bde1124853c6d6d5a0fabb7c7ec68d77e1f9583d54c97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page