A BERT model for nagisa: It is created to be robust against typos and colloquial expressions for Japanese.
Project description
nagisa_bert
This library provides a tokenizer to use the Japanese BERT model for nagisa. The nagisa BERT model is created to be robust against typos and colloquial expressions for Japanese.
It is trained using character and word units with Hugging Face's Transformers. Unknown words are trained on a character unit. The model is available in Transformers 🤗.
Install
Python 3.7+ on Linux or macOS is required. You can install nagisa_bert by using the pip command.
$ pip install nagisa_bert
Usage
This model is available in Transformer's pipeline method.
>>> from transformers import pipeline
>>> from nagisa_bert import NagisaBertTokenizer
>>> text = "nagisaで[MASK]できるモデルです"
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
>>> fill_mask = pipeline("fill-mask", model='taishi-i/nagisa_bert', tokenizer=tokenizer)
>>> print(fill_mask(text))
[{'score': 0.1437765508890152,
'sequence': 'n a g i s a で 使用 できる モデル です',
'token': 1104,
'token_str': '使 用'},
{'score': 0.08369122445583344,
'sequence': 'n a g i s a で 購入 できる モデル です',
'token': 1821,
'token_str': '購 入'},
{'score': 0.07685843855142593,
'sequence': 'n a g i s a で 利用 できる モデル です',
'token': 548,
'token_str': '利 用'},
{'score': 0.07316956669092178,
'sequence': 'n a g i s a で 閲覧 できる モデル です',
'token': 13270,
'token_str': '閲 覧'},
{'score': 0.05647417902946472,
'sequence': 'n a g i s a で 確認 できる モデル です',
'token': 1368,
'token_str': '確 認'}]
Tokenization and vectorization.
>>> from transformers import BertModel
>>> from nagisa_bert import NagisaBertTokenizer
>>> text = "nagisaで[MASK]できるモデルです"
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['n', 'a', 'g', 'i', 's', 'a', 'で', '[MASK]', 'できる', 'モデル', 'です']
>>> model = BertModel.from_pretrained("taishi-i/nagisa_bert")
>>> h = model(**tokenizer(text, return_tensors="pt")).last_hidden_state
>>> print(h)
tensor([[[-1.1636, -0.5645, 0.4484, ..., -0.2207, -0.1540, 0.1051],
[-1.0394, 0.8815, -0.8070, ..., 1.0930, 0.2069, 0.9613],
[-0.2068, -0.1445, -0.6113, ..., -1.2920, 0.0725, -0.2164],
...,
[-1.2590, 0.0118, 0.4998, ..., -0.5212, -0.8015, -0.1050],
[ 0.7925, -0.7628, 0.1016, ..., 0.2233, 0.0164, 0.0102],
[-0.7847, -0.1375, 0.4475, ..., -0.4014, 0.0346, 0.3157]]],
grad_fn=<NativeLayerNormBackward0>)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nagisa_bert-0.0.1.tar.gz
.
File metadata
- Download URL: nagisa_bert-0.0.1.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5441cccc0d134ec85aaccbb48bc913bbf096b38aff778beb33870198b18c7b2e |
|
MD5 | b3a2c34221bc3fa898397edc6bd5e030 |
|
BLAKE2b-256 | 7ce8c5fce470c44e1f45678f37ddb2f7f08b300869f22b78d76a0529b552e77f |
File details
Details for the file nagisa_bert-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: nagisa_bert-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd668c737d0e059e883b79fc65fb5953ed57a34c1c4d2ee6305eb9e7a0c1203a |
|
MD5 | 4cef6d473e02d5c2a0d5274ad88e8461 |
|
BLAKE2b-256 | a0dd3b76d0ea04aafd1a04317540ff707b3e69ed45ff76d460c4dd89a8e6cd19 |