Skip to main content

Feature annotator based on KNP rule files

Project description

Desuwa

PyPI version Python Versions License Downloads

CircleCI CodeQL Maintainability Test Coverage markdownlint jsonlint yamllint

Feature annotator to morphemes and phrases based on KNP rule files (pure-Python)

Quick Start

Desuwa exploits Juman++ outputs.

$ pip install desuwa
$ echo '歌うのは楽しいですわ' | jumanpp | desuwa
+	["&表層:付与", "連体修飾", "用言:動"]
歌う うたう 歌う 動詞 2 * 0 子音動詞ワ行 12 基本形 2 "代表表記:歌う/うたう ドメイン:文化・芸術;レクリエーション"	["タグ単位始", "形態素連結-数詞", "固有修飾", "活用語", "文頭", "文節始", "T連体修飾", "ドメイン:文化・芸術;レクリエーション", "T固有付属", "内容語", "T固有末尾", "自立"]
+	["受けNONE", "外の関係", "形副名詞", "助詞", "T連用", "ハ", "タグ単位受:-1"]
の の の 名詞 6 形式名詞 8 * 0 * 0 NIL	["タグ単位始", "T動連用名詞化前文脈", "形態素連結-数詞", "固有修飾", "形副名詞", "特殊非見出語", "名詞相当語", "T固有付属", "付属", "内容語", "T固有末尾"]
は は は 助詞 9 副助詞 2 * 0 * 0 NIL	["形態素連結-数詞", "固有修飾", "T固有付属", "付属", "T固有末尾"]
+	["&表層:付与", "用言:形", "連体修飾", "助詞"]
楽しい たのしい 楽しい 形容詞 3 * 0 イ形容詞イ段 19 基本形 2 "代表表記:楽しい/たのしい"	["タグ単位始", "形態素連結-数詞", "固有修飾", "活用語", "文節始", "T連体修飾", "T固有付属", "内容語", "T固有末尾", "自立"]
です です です 助動詞 5 * 0 無活用型 26 基本形 2 NIL	["形態素連結-数詞", "固有修飾", "活用語", "T連体修飾", "T固有付属", "付属", "T固有末尾"]
わ わ わ 助詞 9 終助詞 4 * 0 * 0 NIL	["形態素連結-数詞", "固有修飾", "文末", "表現文末", "T固有付属", "付属", "T固有末尾"]
EOS

$ echo '歌うのは楽しいですわ' | jumanpp | desuwa | desuwa --predicate
歌う	歌う/うたう	1	動
楽しいですわ	楽しい/たのしい	1	形

$ echo '歌うのは楽しいですわ' | jumanpp | desuwa --segment
歌う│のは│楽しいですわ

Note

Desuwa is currently confirmed to work with the following rule files.

  • mrph_filter.rule
  • mrph_basic.rule
  • bnst_basic.rule

License

Apache License 2.0 except for rules files in desuwa/knp_rules imported from KNP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desuwa-1.1.0.tar.gz (54.0 kB view hashes)

Uploaded Source

Built Distribution

desuwa-1.1.0-py3-none-any.whl (58.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page