Skip to main content

Japanese tokenizer with transformers library

Project description

jptranstokenizer: Japanese Tokenzier for transformers

Python pypi License Test GitHub release

This is a repository for japanese tokenizer with HuggingFace library.
You can use JapaneseTransformerTokenizer like transformers.BertJapaneseTokenizer.
issue は日本語でも大丈夫です。

Documentations

Documentations are available on readthedoc.

Install

pip install jptranstokenizer

Quickstart

This is the example to use jptranstokenizer.JapaneseTransformerTokenizer with sentencepiece model of nlp-waseda/roberta-base-japanese and Juman++.
Before the following steps, you need to install pyknp and Juman++.

>>> from jptranstokenizer import JapaneseTransformerTokenizer
>>> tokenizer = JapaneseTransformerTokenizer.from_pretrained("nlp-waseda/roberta-base-japanese")
>>> tokens = tokenizer.tokenize("外国人参政権")
# tokens: ['▁外国', '▁人', '▁参政', '▁権']

Note that different dependencies are required depending on the type of tokenizer you use.
See also Quickstart on Read the Docs

Citation

There will be another paper. Be sure to check here again when you cite.

This Implementation

@misc{suzuki-2022-github,
  author = {Masahiro Suzuki},
  title = {jptranstokenizer: Japanese Tokenzier for transformers},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/retarfi/jptranstokenizer}}}

Related Work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jptranstokenizer-0.1.2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jptranstokenizer-0.1.2-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file jptranstokenizer-0.1.2.tar.gz.

File metadata

  • Download URL: jptranstokenizer-0.1.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.14 Linux/5.15.0-1020-azure

File hashes

Hashes for jptranstokenizer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2aa69bd1e9bedd58e95d437b61c7ba0d647281eff1ec6751e21299cef3a41e77
MD5 b3943067970c0ff6cc325709aeb07b71
BLAKE2b-256 410c2c12f9b6d973765c9e4efb89e95b164bd4bbbe7f0a3b168b173e1f2144cf

See more details on using hashes here.

File details

Details for the file jptranstokenizer-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: jptranstokenizer-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.9.14 Linux/5.15.0-1020-azure

File hashes

Hashes for jptranstokenizer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0c327c233031ed2b616fb9fc0df6784e16a3efbc15fe70e09c82e1e55361a4e7
MD5 6932a81ad3b59222f953f4710c859dbb
BLAKE2b-256 80583465494e84b9720cacf79003e9af4f19012c891443c2d0ca32ca7455c0ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page