Custom pretokenizers for Japanese language models
Project description
japanese_pretokenizers (japre)
Custom pretokenizers for Japanese language models
installation
pip install japre
Usage
IpadicPreTokenizer
from japre.ipadic import IpadicPreTokenizer
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer
tokenizer_object = Tokenizer.from_file("your-awesome-tokenizer.json")
tokenizer_object.pre_tokenizer = IpadicPreTokenizer.make()
tokenizer = PreTrainedTokenizerFast(
tokenizer_object=tokenizer_object,
unk_token='[UNK]',
mask_token='[MASK]',
cls_token='[CLS]',
pad_token='[PAD]',
sep_token='[SEP]'
)
ManbyoDictPreTokenizer
export MANBYO_DICT_PATH=/path/to/MANBYO_201907_Dic-utf8.dic
from japre.manbyo import ManbyoDictPreTokenizer
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer
tokenizer_object = Tokenizer.from_file("your-awesome-tokenizer.json")
tokenizer_object.pre_tokenizer = ManbyoDictPreTokenizer.make()
tokenizer = PreTrainedTokenizerFast(
tokenizer_object=tokenizer_object,
unk_token='[UNK]',
mask_token='[MASK]',
cls_token='[CLS]',
pad_token='[PAD]',
sep_token='[SEP]'
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
japre-0.1.4.tar.gz
(3.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
japre-0.1.4-py3-none-any.whl
(3.4 kB
view details)
File details
Details for the file japre-0.1.4.tar.gz.
File metadata
- Download URL: japre-0.1.4.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.2 CPython/3.8.0 Darwin/21.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ce7ceacb35f4fa97c554612849b27a3e0a515c37271fc9f69b5c7de718cbed9
|
|
| MD5 |
c2348302c7676c9e8dc2e7ad9388ec59
|
|
| BLAKE2b-256 |
81c39d9914a7872a608fc81f3020135558f8d6ce9f0c49036c5780230d63f422
|
File details
Details for the file japre-0.1.4-py3-none-any.whl.
File metadata
- Download URL: japre-0.1.4-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.2 CPython/3.8.0 Darwin/21.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf2af416aad09b683983784a9c329d0265d0bee3a99ad29ea18db20cb045f714
|
|
| MD5 |
00a3d758c3d232a35b6a31918b1a5722
|
|
| BLAKE2b-256 |
98e20be90734be878a789e08fcd6e11940c83dc3835686be7cf5c5f015c1c197
|