Skip to main content

Extract phrases in Japanese text by using the part-of-speeches based rules you defined.

Project description


Negima is a Python package to extract phrases in Japanese text by using the part-of-speeches based rules you defined.



Install and update using pip:

$ pip install -U negima

Install using

$ python install


A Simple Example

from negima import MorphemeMerger
mm = MorphemeMerger()
# csv
# tsv
# mm.set_rule_from_csv('rules/1_noun.tsv', sep='\t')
# # excel
# mm.set_rule_from_excel('rules/rules.xlsx', sheet_name='1_noun')

words, _ = mm.get_rule_pattern('今日はいい天気')
$ python
  ['今日', '天気']


You can define rules in a csv, tsv or excel format.
A rule file requires following 9 columns.
Define one of part-of-speeches each row.

*ルールはcsv, tsv, excelファイルの形式で定義することができるます。
ルールには以下の9種のカラムが必要になります。また、1行には1形態素の品詞の情報を定義します。 *

  • id
    • A rule starts with non-empty id column.
    • id has to be unique.
    • Rules are applied in ascendings order of id (ids are compared as UTF-8 strings, not as byte arrays).
      ex: id:000_XXX has priority over id:999_ZZZ
      例: id:000_XXXのルールはid:999_ZZZのルールよりも優先度が高い
  • min
    • Minimum repeat number. 0 means that morpheme is optional.
    • default=1
  • max
    • Maximum repeat number
    • default=1
  • pos0, pos1, pos2, pos3, pos4, pos5
    • Part of speeches of morphemes parsed by mecab.
      • pos0: 表層 (ex: 名詞)
      • pos1: 品詞1 (ex: 副詞可能)
      • pos2: 品詞2
      • pos3: 品詞3
      • pos4: 活用1
      • pos5: 活用2
    • To represent OR condition, concatenate part-of-speeches with | as a separator.

You can add arbitrary columns to your rule file. other columns are just ignored. An example is available at rule/3_independent_phrase.csv, which has a row example that describes an example sentence for the rule.


Simple rule (csv)

A rule to extract compound noun. このようなルールを定義することで、複合名詞を抽出できます

id min max pos0 pos1 pos2 pos3 pos4 pos5
1 0 2 接頭詞
1 4 名詞 一般|サ変接続|数
0 2 名詞 接尾

Caution Don't insert empty row between rules.

注意 ルール同士の間に空行をはさまないようにすること

Rule samples


Extract nouns.

  • 約5000人が国立競技場に駆けつけた -> 5000 国立 競技
  • 場所がわかりにくいのでたどり着けなかった -> 場所


Extract compound nouns.

  • 約5000人が国立競技場に駆けつけた -> 約5000人 国立競技場
  • 場所がわかりにくいのでたどり着けなかった -> 場所


Extract a little complex phrase.

  • 新人研修のレベルは高い -> 新人研修 レベルは高い
  • あのサイトはホテルの比較がしやすくないので好きではない -> サイト ホテル 比較がしやすくない 好きではない

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for negima, version 0.1.3
Filename, size File type Python version Upload date Hashes
Filename, size negima-0.1.3.tar.gz (6.7 kB) File type Source Python version None Upload date Hashes View
Filename, size negima-0.1.3-py3-none-any.whl (7.4 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page