Skip to main content

Chinese Chemical Named Entity Recognition

Project description

ChemNer

Chinese Chemical Named Entity Recognition

本文针对化学术语构成的领域特征,从语素的角度入手,构建了化学领域语素分类表,并进行了有无语素特征的对比实验。

CRF-baseline-model

该实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系作为CRF的特征输入,进行模型训练和预测。

基于N-gram特征的CRF识别结果

术语 长度 术语 总数 术语识别数 正确识别数 正确率 召回率 F值
1 9 1 1 100.00% 11.11% 20.00%
2 34 26 23 88.46% 67.65% 76.67%
3 129 125 116 92.80% 89.92% 91.34%
4 110 111 103 92.79% 93.64% 93.21%
5 50 46 44 95.65% 88.00% 91.67%
6 31 27 23 85.19% 74.19% 79.31%
7 21 21 20 95.24% 95.24% 95.24%
8 16 17 15 88.24% 93.75% 90.91%
9 14 13 12 92.31% 85.71% 88.89%
>=10 34 36 29 80.56% 85.29% 82.86%
all 448 423 386 91.25% 86.16% 88.63%

使用工具CRF++: Yet Another CRF toolkit

BiLSTM-CRF-baseline-model

依赖模块

  • pytorch=1.13.0
  • python3.7+

运行方式

  1. 运行下列命令,进行模型训练:

    python run_lstm_crf.py --do_train
    
  2. 运行下列命令,进行模型预测

    python run_lstm_crf.py --do_predict
    
Acc Recall F1
0.9062 0.8897 0.8979

修改自CLUENER2020

Hmm-model

把语素和语素类建模为简单稳定的HMM模型,利用改进的前向算法规避术语过长的问题,最终达到了91.58%的较好效果。

CRF-model

实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系为特征的基础上,加入当前化学语素类的上下文窗口为5作为CRF的特征输入

基于上下文N-gram特征 + 语素类特征的CRF识别结果

术语长度 术语总数 识别出的术语个数 正确识别的术语个数 正确率 召回率 F值
1 9 2 1 50.00% 11.11% 18.18%
2 34 35 31 88.57% 91.18% 89.86%
3 129 131 123 93.89% 95.35% 94.62%
4 110 111 108 97.30% 98.18% 97.74%
5 50 46 44 95.65% 88.00% 91.67%
6 31 31 28 90.32% 90.32% 90.32%
7 21 21 20 95.24% 95.24% 95.24%
8 16 15 15 100.00% 93.75% 96.77%
9 14 14 13 92.86% 92.86% 92.86%
>=10 34 37 34 91.89% 100.00% 95.77%
all 448 443 417 94.13% 93.08% 93.60%

使用工具CRF++: Yet Another CRF toolkit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemner-0.0.2.tar.gz (58.6 kB view details)

Uploaded Source

Built Distribution

chemner-0.0.2-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file chemner-0.0.2.tar.gz.

File metadata

  • Download URL: chemner-0.0.2.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for chemner-0.0.2.tar.gz
Algorithm Hash digest
SHA256 761eebbcdcf03dd595c2336a50e7d3f7a7756a3ad22f09def676290b5ced2b4f
MD5 8f972e756544aebd5e4f314ba2a3f82f
BLAKE2b-256 b5da305a934da55afd7ec764d846fc8b652b4f91f8e37b46741baa70c1ef3527

See more details on using hashes here.

File details

Details for the file chemner-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: chemner-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for chemner-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0dd6158a1e82e6ac3f2bec4b96b9c6ce47c80c4b3a4cf76d3a52b1581962b570
MD5 82d93e36347979f78832c1219b255cfe
BLAKE2b-256 cc6b54cd8c2628e2fbc978a1a561eb637655aab2cdba5918a4bf9d9e7e6a3ca2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page