Japanese tokenizer with transformers library

These details have not been verified by PyPI

Project description

jptranstokenizer: Japanese Tokenzier for transformers

Python

This is a repository for japanese tokenizer with HuggingFace library.

issue は日本語でも大丈夫です。

Table of Contents

Usage
Roadmap
Citation
- This Implementation
Licenses
Related Work

Usage

To be added

Roadmap

See the open issues for a full list of proposed features (and known issues).

Citation

There will be another paper for this pretrained model. Be sure to check here again when you cite.

This Implementation

@misc{suzuki-2022-github,
  author = {Masahiro Suzuki},
  title = {jptranstokenizer: Japanese Tokenzier for transformers},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/retarfi/jptranstokenizer}}}

Licenses

The codes in this repository are distributed under the Apache License 2.0.

Related Work

Pretrained Japanese BERT models (containing Japanese tokenizer)
- Autor NLP Lab. in Tohoku University
- https://github.com/cl-tohoku/bert-japanese
SudachiTra
- Author Works Applications
- https://github.com/WorksApplications/SudachiTra
UD_Japanese-GSD
- Author megagonlabs
- https://github.com/megagonlabs/UD_Japanese-GSD
Juman++
- Author Kurohashi Lab. in Universyti of Kyoto
- https://github.com/ku-nlp/jumanpp

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Feb 2, 2024

0.3.2

May 9, 2023

0.3.1

Mar 4, 2023

0.3.0

Feb 8, 2023

0.2.0

Jan 19, 2023

0.1.3

Dec 30, 2022

0.1.2

Oct 17, 2022

0.1.0

Sep 27, 2022

This version

0.0.4

Sep 26, 2022

0.0.3

Sep 21, 2022

0.0.2

Aug 24, 2022

0.0.1

Aug 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jptranstokenizer-0.0.4.tar.gz (11.2 kB view hashes)

Uploaded Sep 26, 2022 Source

Built Distribution

jptranstokenizer-0.0.4-py3-none-any.whl (13.9 kB view hashes)

Uploaded Sep 26, 2022 Python 3

Hashes for jptranstokenizer-0.0.4.tar.gz

Hashes for jptranstokenizer-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`ed76bf831eedd0dc01d83c3a1a91028b46f5835f1e46c6f74549014ff7d2c00b`
MD5	`438fb9ab4d3953462723c00332e7afa7`
BLAKE2b-256	`00d916e288812b7012721c4ed1a475cc0a13661b2740c51a770f576fe087d7ee`

Hashes for jptranstokenizer-0.0.4-py3-none-any.whl

Hashes for jptranstokenizer-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`71005a937016b1618165934e385ca3ef87b820d01fe393da669e5f37ddca4bfe`
MD5	`9508e7d2d152c065195b61c661570791`
BLAKE2b-256	`07e2949e04bc03cc4c39c81d8aa2bb0064383e4ab93566ad1438e534e8a8270c`