A morphological analyzer awabi clone
Project description
pure-pyawabi
pure-pyawabi
is a pure python implementation of awabi(https://github.com/nakagami/awabi).
If you have Rust development environment, see also https://github.com/nakagami/pyawabi .
Requirements
Python 3.8+
MeCab dictionary
ex) Ubuntu
$ sudo apt install mecab mecab-ipadic-utf8
Install python package
$ pip install pure-pyawabi
How to use
pyawabi command
$ echo 'すもももももももものうち' | pyawabi
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
$ echo 'すもももももももものうち' | pyawabi -N 2
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
use as package
use function
>>> import pyawabi
>>> import pprint
>>> pp = pprint.PrettyPrinter()
>>> pp.pprint(pyawabi.tokenize("すもももももももものうち"))
[('すもも', '名詞,一般,*,*,*,*,すもも,スモモ,スモモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('の', '助詞,連体化,*,*,*,*,の,ノ,ノ'),
('うち', '名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ')]
>>> pp.pprint(pyawabi.tokenize_n_best("すもももももももものうち", 2))
[[('すもも', '名詞,一般,*,*,*,*,すもも,スモモ,スモモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('の', '助詞,連体化,*,*,*,*,の,ノ,ノ'),
('うち', '名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ')],
[('すもも', '名詞,一般,*,*,*,*,すもも,スモモ,スモモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('の', '助詞,連体化,*,*,*,*,の,ノ,ノ'),
('うち', '名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ')]]
>>>
use tokenizer object
>>> tok = pyawabi.Tokenizer()
>>> pp.pprint(tok.tokenize("すもももももももものうち"))
[('すもも', '名詞,一般,*,*,*,*,すもも,スモモ,スモモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('も', '助詞,係助詞,*,*,*,*,も,モ,モ'),
('もも', '名詞,一般,*,*,*,*,もも,モモ,モモ'),
('の', '助詞,連体化,*,*,*,*,の,ノ,ノ'),
('うち', '名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ')]
>>>
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pure-pyawabi-0.2.4.tar.gz
(7.8 kB
view details)
File details
Details for the file pure-pyawabi-0.2.4.tar.gz
.
File metadata
- Download URL: pure-pyawabi-0.2.4.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81fc89303270b5885827ec49a1932e114b2746fd796aab77db8f4c51a41e5641 |
|
MD5 | 8940cc6e3096ad3ac09ace8523d654b1 |
|
BLAKE2b-256 | 16f71329aa8f1dc78f702400d10ec84a76b601e9d196f87b7f66c21ad9f5c75b |