Skip to main content

Custom pretokenizers for Japanese language models

Project description

japanese_pretokenizers (japre)

Custom pretokenizers for Japanese language models

installation

pip install japre

Usage

IpadicPreTokenizer

from japre.ipadic import IpadicPreTokenizer

from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer

tokenizer_object = Tokenizer.from_file("your-awesome-tokenizer.json")
tokenizer_object.pre_tokenizer = IpadicPreTokenizer.make()
tokenizer = PreTrainedTokenizerFast(
    tokenizer_object=tokenizer_object,
    unk_token='[UNK]',
    mask_token='[MASK]',
    cls_token='[CLS]',
    pad_token='[PAD]',
    sep_token='[SEP]'
)

ManbyoDictPreTokenizer

export MANBYO_DICT_PATH=/path/to/MANBYO_201907_Dic-utf8.dic
from japre.manbyo import ManbyoDictPreTokenizer

from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer

tokenizer_object = Tokenizer.from_file("your-awesome-tokenizer.json")
tokenizer_object.pre_tokenizer = ManbyoDictPreTokenizer.make()
tokenizer = PreTrainedTokenizerFast(
    tokenizer_object=tokenizer_object,
    unk_token='[UNK]',
    mask_token='[MASK]',
    cls_token='[CLS]',
    pad_token='[PAD]',
    sep_token='[SEP]'
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

japre-0.1.4.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

japre-0.1.4-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file japre-0.1.4.tar.gz.

File metadata

  • Download URL: japre-0.1.4.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.2 CPython/3.8.0 Darwin/21.4.0

File hashes

Hashes for japre-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3ce7ceacb35f4fa97c554612849b27a3e0a515c37271fc9f69b5c7de718cbed9
MD5 c2348302c7676c9e8dc2e7ad9388ec59
BLAKE2b-256 81c39d9914a7872a608fc81f3020135558f8d6ce9f0c49036c5780230d63f422

See more details on using hashes here.

File details

Details for the file japre-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: japre-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.2 CPython/3.8.0 Darwin/21.4.0

File hashes

Hashes for japre-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cf2af416aad09b683983784a9c329d0265d0bee3a99ad29ea18db20cb045f714
MD5 00a3d758c3d232a35b6a31918b1a5722
BLAKE2b-256 98e20be90734be878a789e08fcd6e11940c83dc3835686be7cf5c5f015c1c197

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page