No project description provided
Project description
NLP JP Gears
Overview
日本語の自然言語処理で頻出の前処理をまとめたものです。pipelineをしいて、複数の処理をまとめることができます。
API
- pipelineの作成: composer.Composer
- 全角英数字記号を半角に変換: zenhan.ZenToHanConverter
- 半角英数字記号を全角に変換: zenhan.HanToZenConverter
- 括弧とその間のテキストを削除: remover.TextBtwBracketsRemover
Example
>>> from nlp_jp_gears import Composer
>>> from nlp_jp_gears import (
... ZenToHanConverter,
... TextBtwBracketsRemover
... )
>>>
>>> txt_btw_brackets_remover = TextBtwBracketsRemover()
>>> print(txt_btw_brackets_remover.removes)
<{[(「『([〈《〔{«‹
>>>
>>> zenhan_converter = ZenToHanConverter()
>>> print(zenhan_converter.converts)
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
>>>
>>> composer = Composer(txt_btw_brackets_remover, zenhan_converter)
>>> text = "Python(パイソン)で自然言語処理?"
>>> out = composer(text)
>>> print(out)
Pythonで自然言語処理?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlp-jp-gears-0.1.0.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for nlp_jp_gears-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 983b3f2374bf63b897a3b0fc14a18c4aa29d9e16d2ed9fe839a774d8b6b0a284 |
|
MD5 | d4f5a528af832529bcabb23998ea4ad1 |
|
BLAKE2b-256 | 6b078453d7daa1a8efa4b2f4f3a36e7d9c818c77e95d0a3ca3f7c569969a530a |