Skip to main content

Terry toolkit tkitAutoMask,

Project description

tkitAutoMask

自动构建掩码 加入多种动态掩码合集,上下三角和动态片段,以及默认的概率

-上三角,实现类似从左到右的预测,就是单向注意,用于续写。

  • 片段,连续多个mask,更加适合解决补全。

未来尝试加入 模板预测掩码

pip install tkitAutoMask


from tkitAutoMask import autoMask
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("uer/chinese_roberta_L-2_H-128") 
# dir(tokenizer)
tomask = autoMask(
    # transformer,
    mask_token_id = tokenizer.mask_token_id,          # the token id reserved for masking
    pad_token_id = tokenizer.pad_token_id,           # the token id for padding
    mask_prob = 0.05,           # 仅仅是常规的掩码比例 masking probability for masked language modeling
    replace_prob = 0.90,        # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
    mask_ignore_token_ids = [tokenizer.cls_token_id,tokenizer.eos_token_id]  # other tokens to exclude from masking, include the [cls] and [sep] here
)


x=torch.ones(5,5)
for i in range(100):
  a,b=tomask(x)
  # a,b
  print(b)
tensor([[1., 1., 1., 0., 1.],
        [0., 1., 1., 1., 0.],
        [0., 0., 1., 1., 1.],
        [0., 0., 0., 1., 1.],
        [0., 1., 0., 0., 1.]])
tensor([[1., 1., 0., 0., 0.],
        [0., 1., 1., 0., 0.],
        [1., 0., 1., 1., 0.],
        [0., 1., 0., 1., 1.],
        [0., 0., 0., 0., 1.]])
tensor([[1., 1., 1., 0., 1.],
        [0., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1.],
        [1., 0., 0., 1., 1.],
        [0., 0., 0., 1., 1.]])
tensor([[0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])
tensor([[1., 1., 1., 0., 0.],
        [0., 1., 1., 1., 0.],
        [1., 0., 1., 1., 1.],
        [0., 1., 0., 1., 1.],
        [0., 0., 0., 1., 1.]])
tensor([[0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0.]])
tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [1., 0., 0., 0., 1.],
        [1., 1., 0., 0., 0.],
        [1., 1., 1., 0., 1.]])
tensor([[1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0.],
        [1., 1., 1., 0., 0.]])

其他测试

https://colab.research.google.com/drive/1CvkoJ1pZQDRWGPA-5IzJufvocBM-RVT2#scrollTo=UwkociF5ZF-d

详细参考

dev.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tkitAutoMask-0.0.0.316350799.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

tkitAutoMask-0.0.0.316350799-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file tkitAutoMask-0.0.0.316350799.tar.gz.

File metadata

  • Download URL: tkitAutoMask-0.0.0.316350799.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for tkitAutoMask-0.0.0.316350799.tar.gz
Algorithm Hash digest
SHA256 b02a9e4d6b193d7f40d095ccdde9c0ad3c12dda479337547c0fad06d9437c1fb
MD5 2077e37151877cc2d5df08c830aa4fd5
BLAKE2b-256 569324aabf55e7213e06f6a9109a08a543decd699b1c109e957ca0e8ca4a3a99

See more details on using hashes here.

File details

Details for the file tkitAutoMask-0.0.0.316350799-py3-none-any.whl.

File metadata

  • Download URL: tkitAutoMask-0.0.0.316350799-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for tkitAutoMask-0.0.0.316350799-py3-none-any.whl
Algorithm Hash digest
SHA256 69e9c89671c419d1e239a0a4a45bd274af1a0eb49500ff7da4eecaf144e6c659
MD5 800a90083e180e10ce2e1310db4bc971
BLAKE2b-256 1db6b1f3364718f79aea3398b587eb4cc24d55ae3927e06ccf470b8fc15a6de0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page