Terry toolkit tkitAutoMask,
Project description
tkitAutoMask
自动构建掩码
pip install tkitAutoMask
from tkitAutoMask import autoMask
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("uer/chinese_roberta_L-2_H-128")
# dir(tokenizer)
tomask = autoMask(
# transformer,
mask_token_id = tokenizer.mask_token_id, # the token id reserved for masking
pad_token_id = tokenizer.pad_token_id, # the token id for padding
mask_prob = 0.05, # masking probability for masked language modeling
replace_prob = 0.90, # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [tokenizer.cls_token_id,tokenizer.eos_token_id] # other tokens to exclude from masking, include the [cls] and [sep] here
)
详细参考
dev.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for tkitAutoMask-0.0.0.116337015.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1008c04254af13614e529d8d0bd3b67c09e08d453829e9edba80a5240f6eff32 |
|
MD5 | c180369b391b7433b2e8913dfc0128ba |
|
BLAKE2b-256 | d3cf83ec6655c7df0090f07cbe74c612ea4f2454323eb6ea2551304fee597f02 |
Close
Hashes for tkitAutoMask-0.0.0.116337015-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e788d55ee683397fa80d5e3091332e263ddf6ba9cab68e6b2fa0e21cdc5d48b3 |
|
MD5 | 2ea1da6e3f1e98355b34dfd45d6f4b3a |
|
BLAKE2b-256 | a22bb63ee729bf3a6e074a9a8deed7f4985b17c0434ab5c83bd28e7011a8132a |