Skip to main content

Japanese tokenizer with transformers library

Project description

jptranstokenizer: Japanese Tokenzier for transformers

Python pypi License Test GitHub release

This is a repository for japanese tokenizer with HuggingFace library.
You can use JapaneseTransformerTokenizer like transformers.BertJapaneseTokenizer.
issue は日本語でも大丈夫です。

Documentations

Documentations are available on readthedoc.

Install

pip install jptranstokenizer

Quickstart

This is the example to use jptranstokenizer.JapaneseTransformerTokenizer with sentencepiece mode of nlp-waseda/roberta-base-japanese and Juman++.
Before the following steps, you need to install pyknp and Juman++.

>>> from jptranstokenizer import JapaneseTransformerTokenizer
>>> tokenizer = JapaneseTransformerTokenizer.from_pretrained("nlp-waseda/roberta-base-japanese")
>>> tokens = tokenizer.tokenize("外国人参政権")
# tokens: ['▁外国', '▁人', '▁参政', '▁権']

Note that different dependencies are required depending on the type of tokenizer you use.
See also Quickstart on Read the Docs

Citation

There will be another paper. Be sure to check here again when you cite.

This Implementation

@misc{suzuki-2022-github,
  author = {Masahiro Suzuki},
  title = {jptranstokenizer: Japanese Tokenzier for transformers},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/retarfi/jptranstokenizer}}}

Related Work

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jptranstokenizer-0.1.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jptranstokenizer-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file jptranstokenizer-0.1.0.tar.gz.

File metadata

  • Download URL: jptranstokenizer-0.1.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.9.14 Linux/5.15.0-1020-azure

File hashes

Hashes for jptranstokenizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a38ebd5db6a409e0b1fc4892138563dfd393a6c86ff75902f555bcc10466f5d5
MD5 91c1c3ffdb1fbc956de8926e16f6286b
BLAKE2b-256 ff08575779498189ab9211e9aa87a3bf4db6bedb5e3054ef751d55baa95f524e

See more details on using hashes here.

File details

Details for the file jptranstokenizer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jptranstokenizer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.9.14 Linux/5.15.0-1020-azure

File hashes

Hashes for jptranstokenizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50004df776d6b976f473a5b60168ed6737116f97ce7feffc58323cf26a973ad9
MD5 67e05bce278db6b528b7d47d2155db21
BLAKE2b-256 dcff1e33241efa255e204ebfeea9f779fc5cd164ed74ffce8786e55c787bdff7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page