No project description provided
Project description
NLP JP Gears
Overview
日本語の自然言語処理で頻出の前処理をまとめたものです。pipelineをしいて、複数の処理をまとめることができます。
API
- pipelineの作成: composer.Composer
- 全角英数字記号を半角に変換: zenhan.ZenToHanConverter
- 半角英数字記号を全角に変換: zenhan.HanToZenConverter
- 括弧とその間のテキストを削除: remover.TextBtwBracketsRemover
Requirements
Python 3.6+
Installation
pip install nlp-jp-gears
Example
from nlp_jp_gears import Composer
from nlp_jp_gears import (
ZenToHanConverter,
TextBtwBracketsRemover
)
txt_btw_brackets_remover = TextBtwBracketsRemover()
zenhan_converter = ZenToHanConverter()
composer = Composer(txt_btw_brackets_remover, zenhan_converter)
text = "Python(パイソン)で自然言語処理?"
out = composer(text)
print(out)
Then, input text is preprocessed.
Pythonで自然言語処理?
And you can check what is removed and converted, as follows,
print(txt_btw_brackets_remover.removes)
print(zenhan_converter.converts)
<{[(「『([〈《〔{«‹
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp-jp-gears-0.1.1.tar.gz
(4.3 kB
view hashes)
Built Distribution
Close
Hashes for nlp_jp_gears-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be4dcb260c68c1a56d571192423183583a8ce64755fe132a4af4522ff5dabd28 |
|
MD5 | 5fb2ec4d2cf2eedc549a76df765aa429 |
|
BLAKE2b-256 | 7010f01b7982852a2f6338cecdf79c61bb45dba94d016351d69ed3eef7eb11ad |