Skip to main content

Terry toolkit tkitAutoMask,

Project description

tkitAutoMask

自动构建掩码 加入多种动态掩码合集,上下三角和动态片段,以及默认的概率

pip install tkitAutoMask


from tkitAutoMask import autoMask
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("uer/chinese_roberta_L-2_H-128") 
# dir(tokenizer)
tomask = autoMask(
    # transformer,
    mask_token_id = tokenizer.mask_token_id,          # the token id reserved for masking
    pad_token_id = tokenizer.pad_token_id,           # the token id for padding
    mask_prob = 0.05,           # masking probability for masked language modeling
    replace_prob = 0.90,        # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
    mask_ignore_token_ids = [tokenizer.cls_token_id,tokenizer.eos_token_id]  # other tokens to exclude from masking, include the [cls] and [sep] here
)


x=torch.ones(5,5)
for i in range(100):
  a,b=tomask(x)
  # a,b
  print(b)
tensor([[1., 1., 1., 0., 1.],
        [0., 1., 1., 1., 0.],
        [0., 0., 1., 1., 1.],
        [0., 0., 0., 1., 1.],
        [0., 1., 0., 0., 1.]])
tensor([[1., 1., 0., 0., 0.],
        [0., 1., 1., 0., 0.],
        [1., 0., 1., 1., 0.],
        [0., 1., 0., 1., 1.],
        [0., 0., 0., 0., 1.]])
tensor([[1., 1., 1., 0., 1.],
        [0., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1.],
        [1., 0., 0., 1., 1.],
        [0., 0., 0., 1., 1.]])
tensor([[0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])
tensor([[1., 1., 1., 0., 0.],
        [0., 1., 1., 1., 0.],
        [1., 0., 1., 1., 1.],
        [0., 1., 0., 1., 1.],
        [0., 0., 0., 1., 1.]])
tensor([[0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0.]])
tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [1., 0., 0., 0., 1.],
        [1., 1., 0., 0., 0.],
        [1., 1., 1., 0., 1.]])
tensor([[1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0.],
        [1., 1., 1., 0., 0.]])

其他测试

https://colab.research.google.com/drive/1CvkoJ1pZQDRWGPA-5IzJufvocBM-RVT2#scrollTo=UwkociF5ZF-d

详细参考

dev.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tkitAutoMask-0.0.0.116347453.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

tkitAutoMask-0.0.0.116347453-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file tkitAutoMask-0.0.0.116347453.tar.gz.

File metadata

  • Download URL: tkitAutoMask-0.0.0.116347453.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for tkitAutoMask-0.0.0.116347453.tar.gz
Algorithm Hash digest
SHA256 daf7e88fbdc83b3c7d359f8fe76c445c298e2823950ce3a7c4bb83c6b98a4509
MD5 d86cbb51319d3283c8d078024725fffa
BLAKE2b-256 ee2414d2bcb6b66c756a9306b170c395406d5dbac219b0fc9865ac061e400981

See more details on using hashes here.

File details

Details for the file tkitAutoMask-0.0.0.116347453-py3-none-any.whl.

File metadata

  • Download URL: tkitAutoMask-0.0.0.116347453-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for tkitAutoMask-0.0.0.116347453-py3-none-any.whl
Algorithm Hash digest
SHA256 852dbddc4ac102c176ab7b69d7b3e03b969e7f27e149095dd4e39615f14eef86
MD5 6050164a2baf9bff9fc3344cdcc7ed5b
BLAKE2b-256 260b67f65d1d1d1cf206108ad68025ec98938d531dbaf8569622aba7d81117ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page