No project description provided
Project description
NLP JP Gears
Overview
日本語の自然言語処理で頻出の前処理をまとめたものです。pipelineをしいて、複数の処理をまとめることができます。
API
- pipelineの作成: composer.Composer
- 全角英数字記号を半角に変換: zenhan.ZenToHanConverter
- 半角英数字記号を全角に変換: zenhan.HanToZenConverter
- 括弧とその間のテキストを削除: remover.TextBtwBracketsRemover
Requirements
Python 3.6+
Installation
pip install nlp-jp-gears
Example
from nlp_jp_gears import Composer
from nlp_jp_gears import (
ZenToHanConverter,
TextBtwBracketsRemover
)
txt_btw_brackets_remover = TextBtwBracketsRemover()
zenhan_converter = ZenToHanConverter()
composer = Composer(txt_btw_brackets_remover, zenhan_converter)
text = "Python(パイソン)で自然言語処理?"
out = composer(text)
print(out)
Then, input text is preprocessed.
Pythonで自然言語処理?
And you can check what is removed and converted, as follows,
print(txt_btw_brackets_remover.removes)
print(zenhan_converter.converts)
<{[(「『([〈《〔{«‹
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp-jp-gears-0.1.1.tar.gz
(4.3 kB
view details)
Built Distribution
File details
Details for the file nlp-jp-gears-0.1.1.tar.gz
.
File metadata
- Download URL: nlp-jp-gears-0.1.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.3 CPython/3.6.8 Darwin/18.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 097584b2d5d5f0e8d1fc7feaf77161ed7bde24ec6ea028cec2a4102fbfb03ed5 |
|
MD5 | 6e5760e144bc0546d448b3d22061b0c7 |
|
BLAKE2b-256 | 4afa8b625135dda7287afd3c4963c5bc97e8cd51e892ab99acdcb7dc5ff41ed3 |
File details
Details for the file nlp_jp_gears-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: nlp_jp_gears-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.3 CPython/3.6.8 Darwin/18.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be4dcb260c68c1a56d571192423183583a8ce64755fe132a4af4522ff5dabd28 |
|
MD5 | 5fb2ec4d2cf2eedc549a76df765aa429 |
|
BLAKE2b-256 | 7010f01b7982852a2f6338cecdf79c61bb45dba94d016351d69ed3eef7eb11ad |