Terry toolkit tkitAutoMask,
Project description
tkitAutoMask
自动构建掩码 加入多种动态掩码合集,上下三角和动态片段,以及默认的概率
pip install tkitAutoMask
from tkitAutoMask import autoMask
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("uer/chinese_roberta_L-2_H-128")
# dir(tokenizer)
tomask = autoMask(
# transformer,
mask_token_id = tokenizer.mask_token_id, # the token id reserved for masking
pad_token_id = tokenizer.pad_token_id, # the token id for padding
mask_prob = 0.05, # masking probability for masked language modeling
replace_prob = 0.90, # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [tokenizer.cls_token_id,tokenizer.eos_token_id] # other tokens to exclude from masking, include the [cls] and [sep] here
)
x=torch.ones(5,5)
for i in range(100):
a,b=tomask(x)
# a,b
print(b)
tensor([[1., 1., 1., 0., 1.],
[0., 1., 1., 1., 0.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[0., 1., 0., 0., 1.]])
tensor([[1., 1., 0., 0., 0.],
[0., 1., 1., 0., 0.],
[1., 0., 1., 1., 0.],
[0., 1., 0., 1., 1.],
[0., 0., 0., 0., 1.]])
tensor([[1., 1., 1., 0., 1.],
[0., 1., 1., 1., 0.],
[0., 1., 1., 1., 1.],
[1., 0., 0., 1., 1.],
[0., 0., 0., 1., 1.]])
tensor([[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.]])
tensor([[1., 1., 1., 0., 0.],
[0., 1., 1., 1., 0.],
[1., 0., 1., 1., 1.],
[0., 1., 0., 1., 1.],
[0., 0., 0., 1., 1.]])
tensor([[0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0.]])
tensor([[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[1., 0., 0., 0., 1.],
[1., 1., 0., 0., 0.],
[1., 1., 1., 0., 1.]])
tensor([[1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0.],
[1., 0., 0., 0., 0.],
[1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0.]])
其他测试
https://colab.research.google.com/drive/1CvkoJ1pZQDRWGPA-5IzJufvocBM-RVT2#scrollTo=UwkociF5ZF-d
详细参考
dev.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for tkitAutoMask-0.0.0.116347453.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | daf7e88fbdc83b3c7d359f8fe76c445c298e2823950ce3a7c4bb83c6b98a4509 |
|
MD5 | d86cbb51319d3283c8d078024725fffa |
|
BLAKE2b-256 | ee2414d2bcb6b66c756a9306b170c395406d5dbac219b0fc9865ac061e400981 |
Close
Hashes for tkitAutoMask-0.0.0.116347453-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 852dbddc4ac102c176ab7b69d7b3e03b969e7f27e149095dd4e39615f14eef86 |
|
MD5 | 6050164a2baf9bff9fc3344cdcc7ed5b |
|
BLAKE2b-256 | 260b67f65d1d1d1cf206108ad68025ec98938d531dbaf8569622aba7d81117ad |