Chinese Chemical Named Entity Recognition
Project description
ChemNer
Chinese Chemical Named Entity Recognition
本文针对化学术语构成的领域特征,从语素的角度入手,构建了化学领域语素分类表,并进行了有无语素特征的对比实验。
CRF-baseline-model
该实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系作为CRF的特征输入,进行模型训练和预测。
基于N-gram特征的CRF识别结果
| 术语 长度 | 术语 总数 | 术语识别数 | 正确识别数 | 正确率 | 召回率 | F值 |
|---|---|---|---|---|---|---|
| 1 | 9 | 1 | 1 | 100.00% | 11.11% | 20.00% |
| 2 | 34 | 26 | 23 | 88.46% | 67.65% | 76.67% |
| 3 | 129 | 125 | 116 | 92.80% | 89.92% | 91.34% |
| 4 | 110 | 111 | 103 | 92.79% | 93.64% | 93.21% |
| 5 | 50 | 46 | 44 | 95.65% | 88.00% | 91.67% |
| 6 | 31 | 27 | 23 | 85.19% | 74.19% | 79.31% |
| 7 | 21 | 21 | 20 | 95.24% | 95.24% | 95.24% |
| 8 | 16 | 17 | 15 | 88.24% | 93.75% | 90.91% |
| 9 | 14 | 13 | 12 | 92.31% | 85.71% | 88.89% |
| >=10 | 34 | 36 | 29 | 80.56% | 85.29% | 82.86% |
| all | 448 | 423 | 386 | 91.25% | 86.16% | 88.63% |
BiLSTM-CRF-baseline-model
依赖模块
- pytorch=1.13.0
- python3.7+
运行方式
-
运行下列命令,进行模型训练:
python run_lstm_crf.py --do_train
-
运行下列命令,进行模型预测
python run_lstm_crf.py --do_predict
| Acc | Recall | F1 |
|---|---|---|
| 0.9062 | 0.8897 | 0.8979 |
修改自CLUENER2020
Hmm-model
把语素和语素类建模为简单稳定的HMM模型,利用改进的前向算法规避术语过长的问题,最终达到了91.58%的较好效果。
CRF-model
实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系为特征的基础上,加入当前化学语素类的上下文窗口为5作为CRF的特征输入
基于上下文N-gram特征 + 语素类特征的CRF识别结果
| 术语长度 | 术语总数 | 识别出的术语个数 | 正确识别的术语个数 | 正确率 | 召回率 | F值 |
|---|---|---|---|---|---|---|
| 1 | 9 | 2 | 1 | 50.00% | 11.11% | 18.18% |
| 2 | 34 | 35 | 31 | 88.57% | 91.18% | 89.86% |
| 3 | 129 | 131 | 123 | 93.89% | 95.35% | 94.62% |
| 4 | 110 | 111 | 108 | 97.30% | 98.18% | 97.74% |
| 5 | 50 | 46 | 44 | 95.65% | 88.00% | 91.67% |
| 6 | 31 | 31 | 28 | 90.32% | 90.32% | 90.32% |
| 7 | 21 | 21 | 20 | 95.24% | 95.24% | 95.24% |
| 8 | 16 | 15 | 15 | 100.00% | 93.75% | 96.77% |
| 9 | 14 | 14 | 13 | 92.86% | 92.86% | 92.86% |
| >=10 | 34 | 37 | 34 | 91.89% | 100.00% | 95.77% |
| all | 448 | 443 | 417 | 94.13% | 93.08% | 93.60% |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chemner-0.0.2.tar.gz.
File metadata
- Download URL: chemner-0.0.2.tar.gz
- Upload date:
- Size: 58.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
761eebbcdcf03dd595c2336a50e7d3f7a7756a3ad22f09def676290b5ced2b4f
|
|
| MD5 |
8f972e756544aebd5e4f314ba2a3f82f
|
|
| BLAKE2b-256 |
b5da305a934da55afd7ec764d846fc8b652b4f91f8e37b46741baa70c1ef3527
|
File details
Details for the file chemner-0.0.2-py3-none-any.whl.
File metadata
- Download URL: chemner-0.0.2-py3-none-any.whl
- Upload date:
- Size: 54.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dd6158a1e82e6ac3f2bec4b96b9c6ce47c80c4b3a4cf76d3a52b1581962b570
|
|
| MD5 |
82d93e36347979f78832c1219b255cfe
|
|
| BLAKE2b-256 |
cc6b54cd8c2628e2fbc978a1a561eb637655aab2cdba5918a4bf9d9e7e6a3ca2
|