Feature annotator based on KNP rule files
Project description
Desuwa
Feature annotator to morphemes and phrases based on KNP rule files (pure-Python)
Quick Start
Desuwa exploits Juman++ outputs.
$ pip install desuwa
$ echo '歌うのは楽しいですわ' | jumanpp | desuwa
+ ["&表層:付与", "連体修飾", "用言:動"]
歌う うたう 歌う 動詞 2 * 0 子音動詞ワ行 12 基本形 2 "代表表記:歌う/うたう ドメイン:文化・芸術;レクリエーション" ["タグ単位始", "形態素連結-数詞", "固有修飾", "活用語", "文頭", "文節始", "T連体修飾", "ドメイン:文化・芸術;レクリエーション", "T固有付属", "内容語", "T固有末尾", "自立"]
+ ["受けNONE", "外の関係", "形副名詞", "助詞", "T連用", "ハ", "タグ単位受:-1"]
の の の 名詞 6 形式名詞 8 * 0 * 0 NIL ["タグ単位始", "T動連用名詞化前文脈", "形態素連結-数詞", "固有修飾", "形副名詞", "特殊非見出語", "名詞相当語", "T固有付属", "付属", "内容語", "T固有末尾"]
は は は 助詞 9 副助詞 2 * 0 * 0 NIL ["形態素連結-数詞", "固有修飾", "T固有付属", "付属", "T固有末尾"]
+ ["&表層:付与", "用言:形", "連体修飾", "助詞"]
楽しい たのしい 楽しい 形容詞 3 * 0 イ形容詞イ段 19 基本形 2 "代表表記:楽しい/たのしい" ["タグ単位始", "形態素連結-数詞", "固有修飾", "活用語", "文節始", "T連体修飾", "T固有付属", "内容語", "T固有末尾", "自立"]
です です です 助動詞 5 * 0 無活用型 26 基本形 2 NIL ["形態素連結-数詞", "固有修飾", "活用語", "T連体修飾", "T固有付属", "付属", "T固有末尾"]
わ わ わ 助詞 9 終助詞 4 * 0 * 0 NIL ["形態素連結-数詞", "固有修飾", "文末", "表現文末", "T固有付属", "付属", "T固有末尾"]
EOS
$ echo '歌うのは楽しいですわ' | jumanpp | desuwa | desuwa --predicate
歌う 歌う/うたう 1 動
楽しいですわ 楽しい/たのしい 1 形
$ echo '歌うのは楽しいですわ' | jumanpp | desuwa --segment
歌う│のは│楽しいですわ
Note
Desuwa is currently confirmed to work with the following rule files.
mrph_filter.rule
mrph_basic.rule
bnst_basic.rule
License
Apache License 2.0 except for rules files in desuwa/knp_rules imported from KNP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
desuwa-1.1.0.tar.gz
(54.0 kB
view hashes)
Built Distribution
desuwa-1.1.0-py3-none-any.whl
(58.0 kB
view hashes)