Chinese Chemical Named Entity Recognition
Project description
ChemNer
Chinese Chemical Named Entity Recognition
本文针对化学术语构成的领域特征,从语素的角度入手,构建了化学领域语素分类表,并进行了有无语素特征的对比实验。
CRF-baseline-model
该实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系作为CRF的特征输入,进行模型训练和预测。
基于N-gram特征的CRF识别结果
术语 长度 | 术语 总数 | 术语识别数 | 正确识别数 | 正确率 | 召回率 | F值 |
---|---|---|---|---|---|---|
1 | 9 | 1 | 1 | 100.00% | 11.11% | 20.00% |
2 | 34 | 26 | 23 | 88.46% | 67.65% | 76.67% |
3 | 129 | 125 | 116 | 92.80% | 89.92% | 91.34% |
4 | 110 | 111 | 103 | 92.79% | 93.64% | 93.21% |
5 | 50 | 46 | 44 | 95.65% | 88.00% | 91.67% |
6 | 31 | 27 | 23 | 85.19% | 74.19% | 79.31% |
7 | 21 | 21 | 20 | 95.24% | 95.24% | 95.24% |
8 | 16 | 17 | 15 | 88.24% | 93.75% | 90.91% |
9 | 14 | 13 | 12 | 92.31% | 85.71% | 88.89% |
>=10 | 34 | 36 | 29 | 80.56% | 85.29% | 82.86% |
all | 448 | 423 | 386 | 91.25% | 86.16% | 88.63% |
BiLSTM-CRF-baseline-model
依赖模块
- pytorch=1.13.0
- python3.7+
运行方式
-
运行下列命令,进行模型训练:
python run_lstm_crf.py --do_train
-
运行下列命令,进行模型预测
python run_lstm_crf.py --do_predict
Acc | Recall | F1 |
---|---|---|
0.9062 | 0.8897 | 0.8979 |
修改自CLUENER2020
Hmm-model
把语素和语素类建模为简单稳定的HMM模型,利用改进的前向算法规避术语过长的问题,最终达到了91.58%的较好效果。
CRF-model
实验选择当前字的上下文窗口为5,当前输出标签与上一输出标签的关系为特征的基础上,加入当前化学语素类的上下文窗口为5作为CRF的特征输入
基于上下文N-gram特征 + 语素类特征的CRF识别结果
术语长度 | 术语总数 | 识别出的术语个数 | 正确识别的术语个数 | 正确率 | 召回率 | F值 |
---|---|---|---|---|---|---|
1 | 9 | 2 | 1 | 50.00% | 11.11% | 18.18% |
2 | 34 | 35 | 31 | 88.57% | 91.18% | 89.86% |
3 | 129 | 131 | 123 | 93.89% | 95.35% | 94.62% |
4 | 110 | 111 | 108 | 97.30% | 98.18% | 97.74% |
5 | 50 | 46 | 44 | 95.65% | 88.00% | 91.67% |
6 | 31 | 31 | 28 | 90.32% | 90.32% | 90.32% |
7 | 21 | 21 | 20 | 95.24% | 95.24% | 95.24% |
8 | 16 | 15 | 15 | 100.00% | 93.75% | 96.77% |
9 | 14 | 14 | 13 | 92.86% | 92.86% | 92.86% |
>=10 | 34 | 37 | 34 | 91.89% | 100.00% | 95.77% |
all | 448 | 443 | 417 | 94.13% | 93.08% | 93.60% |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chemner-0.0.2.tar.gz
(58.6 kB
view details)
Built Distribution
chemner-0.0.2-py3-none-any.whl
(54.2 kB
view details)
File details
Details for the file chemner-0.0.2.tar.gz
.
File metadata
- Download URL: chemner-0.0.2.tar.gz
- Upload date:
- Size: 58.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 761eebbcdcf03dd595c2336a50e7d3f7a7756a3ad22f09def676290b5ced2b4f |
|
MD5 | 8f972e756544aebd5e4f314ba2a3f82f |
|
BLAKE2b-256 | b5da305a934da55afd7ec764d846fc8b652b4f91f8e37b46741baa70c1ef3527 |
File details
Details for the file chemner-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: chemner-0.0.2-py3-none-any.whl
- Upload date:
- Size: 54.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dd6158a1e82e6ac3f2bec4b96b9c6ce47c80c4b3a4cf76d3a52b1581962b570 |
|
MD5 | 82d93e36347979f78832c1219b255cfe |
|
BLAKE2b-256 | cc6b54cd8c2628e2fbc978a1a561eb637655aab2cdba5918a4bf9d9e7e6a3ca2 |