A package for various transformer based NLP models.
Project description
simplebert
基于tensorflow.keras的各类Transformer模型的简单封装。本项目初衷为供本人学习使用,力求提供最简便的API调用方法,也欢迎有需要的人下载使用。项目编写过程中参考了Huggingface Transformers, The Annotated Transformer, 以及bert4keras等资料和代码。
主要功能
- 支持加载Google原版的BERT模型权重
- 支持加载Huggingface的BERT模型权重
安装
pip install simplebert
使用范例
最简单的调用如下。
from simplebert.tokenizers import tokenizer_from_pretrained
from simplebert.models import model_from_pretrained
# 选择要加载的模型的名称
model_name = 'bert-base-chinese'
# 创建并加载分词器
tokenizer = tokenizer_from_pretrained(model_name)
# 创建并加载模型
# 选择lm (LanguageModelHead)和pooler两种model head
model = model_from_pretrained(model_name, model_head = ['lm', 'pooler'])
# 调用分词器产生输入
inputs = tokenizer([u'为啥科技公司都想养只机器狗?', u'一些公司已经将四足机器人应用在了业务中。'])
# 调用模型产生输出,输出所有层的结果
output = model(inputs, output_hidden_states = True)
# 输出结果
print(output['sequence_output'].shape) # 最后一层的输出
print(output['logits'].shape) # 'lm'产生的输出
print(output['pooler_output'].shape) # 'pooler'产生的输出
print(output['hidden_states'][-2].shape) # 倒数第二层产生的输出
可以选择的模型在pretrained_models.json
文件中配置。
如果已预先下载了有权重文件到本地,可用如下方式调用。
from simplebert.tokenizers import Tokenizer
from simplebert.models import ModelConfig, BertModel
config_file = '/path/to/bert_config.json'
vocab_file = '/path/to/vocab.txt'
checkpoint_file = '/path/to/checkpoint.ckp'
tokenizer = Tokenizer(config_file, cased = True)
config = ModelConfig(config_file)
model = BertModel(config, model_head = 'lm')
#...
支持的模型权重
- Google原版BERT:https://github.com/google-research/bert 包括:bert-base-uncased, bert-base-cased, bert-base-chinese, bert-base-cased-multi-lang, bert-large-uncased, bert-large-cased, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking
- Huggingface的BERT模型:https://huggingface.co/transformers/model_doc/bert.html. 权重名称包括:huggingface-bert-base-cased, huggingface-bert-base-uncased, huggingface-bert-large-uncased, huggingface-bert-large-cased, huggingface-bert-base-chinese
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simplebert-0.1.3.tar.gz
(16.5 kB
view hashes)
Built Distribution
simplebert-0.1.3-py3-none-any.whl
(16.6 kB
view hashes)
Close
Hashes for simplebert-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8fbc71e3ac6037e750dd97b172948172e696d55d87743796921d735643dbe92 |
|
MD5 | 2c3dfb7dfc13a267cecc0a8c8c00d65f |
|
BLAKE2b-256 | 997f9e499a4f98f2eb39593ac0174d132619e491f1ec06fa4718086e32360b28 |