Terry toolkit tkitAutoMask,
Project description
tkitAutoMask
自动构建掩码 加入多种动态掩码合集,上下三角和动态片段,以及默认的概率
-上三角,实现类似从左到右的预测,就是单向注意,用于续写。
- 片段,连续多个mask,更加适合解决补全。
未来尝试加入 模板预测掩码
pip install tkitAutoMask
from tkitAutoMask import autoMask
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("uer/chinese_roberta_L-2_H-128")
# dir(tokenizer)
tomask = autoMask(
# transformer,
mask_token_id = tokenizer.mask_token_id, # the token id reserved for masking
pad_token_id = tokenizer.pad_token_id, # the token id for padding
mask_prob = 0.05, # 仅仅是常规的掩码比例 masking probability for masked language modeling
replace_prob = 0.90, # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [tokenizer.cls_token_id,tokenizer.eos_token_id] # other tokens to exclude from masking, include the [cls] and [sep] here
)
x=torch.ones(5,5)
for i in range(100):
a,b=tomask(x)
# a,b
print(b)
tensor([[1., 1., 1., 0., 1.],
[0., 1., 1., 1., 0.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[0., 1., 0., 0., 1.]])
tensor([[1., 1., 0., 0., 0.],
[0., 1., 1., 0., 0.],
[1., 0., 1., 1., 0.],
[0., 1., 0., 1., 1.],
[0., 0., 0., 0., 1.]])
tensor([[1., 1., 1., 0., 1.],
[0., 1., 1., 1., 0.],
[0., 1., 1., 1., 1.],
[1., 0., 0., 1., 1.],
[0., 0., 0., 1., 1.]])
tensor([[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.]])
tensor([[1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0.],
[1., 0., 1., 1., 1.],
[0., 1., 0., 1., 1.],
[0., 0., 0., 1., 1.]])
tensor([[0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0.]])
tensor([[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[1., 0., 0., 0., 1.],
[1., 1., 0., 0., 0.],
[1., 1., 1., 0., 1.]])
tensor([[1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0.],
[1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0.]])
其他测试
https://colab.research.google.com/drive/1CvkoJ1pZQDRWGPA-5IzJufvocBM-RVT2#scrollTo=UwkociF5ZF-d
详细参考
dev.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for tkitAutoMask-0.0.0.316350799.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b02a9e4d6b193d7f40d095ccdde9c0ad3c12dda479337547c0fad06d9437c1fb |
|
MD5 | 2077e37151877cc2d5df08c830aa4fd5 |
|
BLAKE2b-256 | 569324aabf55e7213e06f6a9109a08a543decd699b1c109e957ca0e8ca4a3a99 |
Close
Hashes for tkitAutoMask-0.0.0.316350799-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69e9c89671c419d1e239a0a4a45bd274af1a0eb49500ff7da4eecaf144e6c659 |
|
MD5 | 800a90083e180e10ce2e1310db4bc971 |
|
BLAKE2b-256 | 1db6b1f3364718f79aea3398b587eb4cc24d55ae3927e06ccf470b8fc15a6de0 |