Custom pretokenizers for Japanese language models
Project description
japanese_pretokenizers (japre)
Custom pretokenizers for Japanese language models
installation
pip install japre
Usage
from japre.pretokenizer import IpadicPreTokenizer
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer
tokenizer_object = Tokenizer.from_file("your-awesome-tokenizer.json")
tokenizer_object.pre_tokenizer = IpadicPreTokenizer.make()
tokenizer = PreTrainedTokenizerFast(
tokenizer_object=tokenizer_object,
unk_token='[UNK]',
mask_token='[MASK]',
cls_token='[CLS]',
pad_token='[PAD]',
sep_token='[SEP]'
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
japre-0.1.2.tar.gz
(2.9 kB
view hashes)
Built Distribution
japre-0.1.2-py3-none-any.whl
(3.1 kB
view hashes)