Japanese tokenizer with transformers library
Project description
jptranstokenizer: Japanese Tokenzier for transformers
This is a repository for japanese tokenizer with HuggingFace library.
issue は日本語でも大丈夫です。
Table of Contents
Usage
To be added
Roadmap
See the open issues for a full list of proposed features (and known issues).
Citation
There will be another paper for this pretrained model. Be sure to check here again when you cite.
This Implementation
@misc{suzuki-2022-github,
author = {Masahiro Suzuki},
title = {jptranstokenizer: Japanese Tokenzier for transformers},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/retarfi/jptranstokenizer}}}
Licenses
The codes in this repository are distributed under the Apache License 2.0.
Related Work
- Pretrained Japanese BERT models (containing Japanese tokenizer)
- Autor NLP Lab. in Tohoku University
- https://github.com/cl-tohoku/bert-japanese
- SudachiTra
- Author Works Applications
- https://github.com/WorksApplications/SudachiTra
- UD_Japanese-GSD
- Author megagonlabs
- https://github.com/megagonlabs/UD_Japanese-GSD
- Juman++
- Author Kurohashi Lab. in Universyti of Kyoto
- https://github.com/ku-nlp/jumanpp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jptranstokenizer-0.0.4.tar.gz
(11.2 kB
view hashes)
Built Distribution
Close
Hashes for jptranstokenizer-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71005a937016b1618165934e385ca3ef87b820d01fe393da669e5f37ddca4bfe |
|
MD5 | 9508e7d2d152c065195b61c661570791 |
|
BLAKE2b-256 | 07e2949e04bc03cc4c39c81d8aa2bb0064383e4ab93566ad1438e534e8a8270c |