Chinese Information Extraction
Project description
- 1.MultiLayerResCNN(cnn4ie/mlrescnn):多层残差CNN(+CRF), Convolutional Sequence to Sequence Learning 。
- 2.MultiLayerResDSCNN(cnn4ie/dscnn):多层残差深度可分离depthwise_separable_convolutionCNN(+CRF), Xception: Deep Learning with Depthwise Separable Convolutions 。
- 3.MultiLayerAugmentedCNN(cnn4ie/attention_augmented_cnn):多层残差注意力增强CNN(+CRF), Attention Augmented Convolutional Networks 。
- 4.MultiLayerLambdaCNN(cnn4ie/lambda_cnn):多层残差LambdaCNN(+CRF), LambdaNetworks: Modeling long-range Interactions without Attention 。
- 5.MultiLayerResLWCNN(cnn4ie/lcnn):多层残差轻量LightweightCNN(+CRF), Pay Less Attention with Lightweight and Dynamic Convolutions 。
- 6.MultiLayerResDYCNN(cnn4ie/dcnn):多层残差动态DynamicCNN(+CRF), Pay Less Attention with Lightweight and Dynamic Convolutions 。
- 7.MultiLayerStdAttnCNN(cnn4ie/stand_alone_self_attention_cnn):多层残差独立自注意力stand_alone_self_attention_CNN(+CRF),Stand-Alone Self-Attention in Vision Models 。
- 8.MultiLayerCSAttCNN(cnn4ie/channel_spatial_attention_cnn),多层残差联合通道和空间注意力channel_spatial_attention_CNN(+CRF),CBAM: Convolutional Block Attention Module 。
from cnn4ie.mlrescnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 199 | Time: 0m 4s Train Loss: 228.545 | Train PPL: 1.802960293422957e+99 Val. Loss: 433.577 | Val. PPL: 1.9966207577208172e+188 Val. report: precision recall f1-score support 1 1.00 1.00 1.00 4539 2 0.98 0.99 0.99 4926 3 0.90 0.83 0.86 166 4 0.74 0.98 0.84 52 5 0.94 0.77 0.84 120 6 0.76 0.97 0.85 39 7 0.82 0.87 0.85 54 8 0.93 0.74 0.82 68 9 0.95 0.77 0.85 26 10 1.00 0.80 0.89 10 accuracy 0.98 10000 macro avg 0.90 0.87 0.88 10000 weighted avg 0.99 0.98 0.98 10000
from cnn4ie.mlrescnn.predict import Predict predict = Predict() predict.load_model_vocab('config_cfg') result = predict.predict('据新华社报道,安徽省六安市被评上十大易居城市!') print(result)
[{'start': 7, 'stop': 13, 'word': '安徽省六安市', 'type': 'LOC'}, {'start': 1, 'stop': 4, 'word': '新华社', 'type': 'ORG'}]
from cnn4ie.dscnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 192 | Time: 0m 3s Train Loss: 191.273 | Train PPL: 1.172960293422957e+99 Val. Loss: 533.260 | Val. PPL: 5.2866207577208172e+188 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 4539 2 0.98 0.98 0.98 4926 3 0.92 0.82 0.87 166 4 0.82 0.88 0.85 52 5 0.84 0.76 0.80 120 6 0.90 0.95 0.92 39 7 0.90 0.85 0.88 54 8 0.84 0.71 0.77 68 9 0.85 0.65 0.74 26 10 1.00 0.70 0.82 10 accuracy 0.98 10000 macro avg 0.91 0.83 0.86 10000 weighted avg 0.98 0.98 0.98 10000
from cnn4ie.dscnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
from cnn4ie.attention_augmented_cnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 192 | Time: 0m 3s Train Loss: 185.204 | Train PPL: 2.711303579086953e+80 Val. Loss: 561.592 | Val. PPL: 7.877783034926193e+243 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 4539 2 0.98 0.99 0.98 4926 3 0.96 0.77 0.85 166 4 0.81 0.85 0.83 52 5 0.88 0.71 0.78 120 6 0.90 0.90 0.90 39 7 0.90 0.85 0.88 54 8 0.85 0.69 0.76 68 9 1.00 0.42 0.59 26 10 1.00 0.50 0.67 10 accuracy 0.98 10000 macro avg 0.93 0.77 0.82 10000 weighted avg 0.98 0.98 0.98 10000
from cnn4ie.attention_augmented_cnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
from cnn4ie.lambda_cnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 197 | Time: 0m 2s Train Loss: 198.344 | Train PPL: 1.3800537707438322e+86 Val. Loss: 668.780 | Val. PPL: 2.8022239331403918e+290 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 4539 2 0.98 0.98 0.98 4926 3 0.80 0.78 0.79 166 4 0.89 0.90 0.90 52 5 0.86 0.77 0.81 120 6 0.90 0.92 0.91 39 7 0.81 0.87 0.84 54 8 0.88 0.75 0.81 68 9 0.93 0.54 0.68 26 10 1.00 0.70 0.82 10 accuracy 0.98 10000 macro avg 0.90 0.82 0.85 10000 weighted avg 0.98 0.98 0.98 10000
from cnn4ie.lambda_cnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
from cnn4ie.lcnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 190 | Time: 0m 4s Train Loss: 195.472 | Train PPL: 7.807223255192846e+84 Val. Loss: 453.642 | Val. PPL: 1.0328983269312897e+197 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 5925 2 0.99 0.98 0.98 5501 3 0.90 0.85 0.87 174 4 0.72 0.93 0.81 57 5 0.92 0.81 0.86 122 6 0.82 0.91 0.86 44 7 0.84 0.85 0.85 62 8 0.92 0.77 0.84 71 9 0.66 0.81 0.72 31 10 0.91 0.77 0.83 13 accuracy 0.98 12000 macro avg 0.86 0.87 0.86 12000 weighted avg 0.98 0.98 0.98 12000
from cnn4ie.lcnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
from cnn4ie.dcnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 192 | Time: 0m 4s Train Loss: 182.916 | Train PPL: 2.7491663642617552e+79 Val. Loss: 463.782 | Val. PPL: 2.618555606950152e+201 Val. report: precision recall f1-score support 1 1.00 1.00 1.00 5925 2 0.99 0.98 0.98 5501 3 0.86 0.86 0.86 174 4 0.80 0.93 0.86 57 5 0.84 0.79 0.81 122 6 0.83 0.89 0.86 44 7 0.83 0.87 0.85 62 8 0.88 0.75 0.81 71 9 0.92 0.71 0.80 31 10 1.00 0.85 0.92 13 accuracy 0.98 12000 macro avg 0.89 0.86 0.88 12000 weighted avg 0.98 0.98 0.98 12000
from cnn4ie.dcnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
from cnn4ie.stand_alone_self_attention_cnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 195 | Time: 0m 3s Train Loss: 247.570 | Train PPL: 3.29768182789317e+107 Val. Loss: 681.482 | Val. PPL: 9.20623044303632e+295 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 4539 2 0.99 0.99 0.99 4926 3 0.95 0.86 0.90 166 4 0.93 0.96 0.94 52 5 0.91 0.78 0.84 120 6 0.93 0.97 0.95 39 7 0.80 0.89 0.84 54 8 0.91 0.72 0.80 68 9 1.00 0.69 0.82 26 10 1.00 0.90 0.95 10 accuracy 0.98 10000 macro avg 0.94 0.88 0.90 10000 weighted avg 0.98 0.98 0.98 10000
from cnn4ie.stand_alone_self_attention_cnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 19, 'stop': 26, 'word': '全国人大常委会', 'type': 'ORG'}, {'start': 32, 'stop': 36, 'word': ' 今天下午', 'type': 'T'}, {'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}]
(1).训练from cnn4ie.channel_spatial_attention_cnn.train import Train train = Train() train.train_model('config.cfg')
Epoch: 181 | Time: 0m 3s Train Loss: 112.922 | Train PPL: 1.1001029953413096e+49 Val. Loss: 493.448 | Val. PPL: 2.002428912702234e+214 Val. report: precision recall f1-score support 1 0.99 1.00 1.00 4539 2 0.98 0.98 0.98 4926 3 0.89 0.81 0.85 166 4 0.77 0.88 0.82 52 5 0.90 0.73 0.81 120 6 0.84 0.92 0.88 39 7 0.81 0.89 0.85 54 8 0.90 0.69 0.78 68 9 0.85 0.85 0.85 26 10 0.82 0.90 0.86 10 accuracy 0.98 10000 macro avg 0.88 0.87 0.87 10000 weighted avg 0.98 0.98 0.98 10000
from cnn4ie.channel_spatial_attention_cnn.predict import Predict predict = Predict() predict.load_model_vocab('config.cfg') result = predict.predict('本报北京2月28日讯记者苏宁报道:八届全国人大常委会第三十次会议今天下午在京闭幕。') print(result)
[{'start': 2, 'stop': 4, 'word': '北京', 'type': 'LOC'}, {'start': 12, 'stop': 14, 'word': '苏宁', 'type': 'LOC'}, {'start': 32, 'stop': 36, 'word': '今天下午', 'type': 'T'}]
- 安装:pip install CNN4IE
- 下载源码:
git clone
python install
这里利用data(来自人民日报,识别的是[ORG, PER, LOC, T, O])中的数据进行训练评估,模型1的训练及评估结果(分为带预训练向量和不带预训练向量的训练结果)见examples/mlrescnn(其它模型可自行运行评估)。
author = {Shi Yan},
title = {CNN4IE: Chinese Information Extraction Tool},
year = {2021},
url = {},
CNN4IE 的授权协议为 Apache License 2.0,可免费用做商业用途。请在产品说明中附加CNN4IE的链接和授权协议。CNN4IE受版权法保护,侵权必究。
(1).CNN4IE 0.1.0 init commit
(2).CNN4IE 0.1.1 update self.max_len
(3).CNN4IE 0.1.2 update new model -> [MultiLayerResDSCNN]
(4).CNN4IE 0.1.3 update new model -> [MultiLayerAugmentedCNN]、[MultiLayerLambdaCNN]
(5).CNN4IE 0.1.4 update new model -> [MultiLayerResLWCNN]、[MultiLayerResDYCNN]
(6).CNN4IE 0.1.5 update new model -> [MultiLayerStdAttnCNN]
(7).CNN4IE 0.1.6 update new model -> [MultiLayerCSAttCNN]
- fairseq
- allennlp
- Convolutional Sequence to Sequence Learning
- Deep Residual Learning for Image Recognition
- Xception: Deep Learning with Depthwise Separable Convolutions
- Attention Augmented Convolutional Networks
- LambdaNetworks: Modeling long-range Interactions without Attention
- Pay Less Attention with Lightweight and Dynamic Convolutions
- Stand-Alone Self-Attention in Vision Models
- CBAM: Convolutional Block Attention Module
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file CNN4IE-0.1.6-py3-none-any.whl
File metadata
- Download URL: CNN4IE-0.1.6-py3-none-any.whl
- Upload date:
- Size: 88.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.4.2 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
SHA256 | 2763184c8f06496965fabb5d59c5404d1b238feaa4d5200e022febab26ef0712 |
MD5 | 4a4935bac1750a02d9416c4e71e576f3 |
BLAKE2b-256 | 415d1f28ba8d21b2abec3d1968666b6db403166ecb402a58548265509ddb6600 |