Skip to main content

Cotoha API, created by NTT Communications Corporation, for Python

Project description

CotohapPy: Cotoha for Python

Downloads Downloads Downloads

CotohapPy (Japanese: コトハッピー) is for connecting to Cotoha API, one of the Japanese morphological analysis engines, and is for reshaping the response more readably.

Installation

The easiest way to install the latest version is by using pip/easy_install to pull it from PyPI:

pip install cotohappy

You may also use Git to clone the repository from GitHub and install it manually:

git clone https://github.com/278Mt/cotohappy.git
cd cotohappy
python setup.py install

Python 3.7 and 3.8 are supported (frequently updated).

Requirements

  • json
  • requests

Usage

This is one of the examples.

├── payload.json
└── sjgyoen.py

JSON preperation (payload.json):

{
  "AccessTokenPublishURL": "https://api.ce-cotoha.com/v1/oauth/accesstokens",
  "APIBaseURL"           : "https://api.ce-cotoha.com/api/dev/",
  "ClientId"             : "ABCDEFGHIJKLMNOPQRSTUVWXYZ123456",
  "ClientSecret"         : "7890abcdefghijkl"
}

Programme (sjgyoen.py)

import cotohappy
import requests
from bs4 import BeautifulSoup


def get_kotonoha_story():

    url = 'https://www.kotonohanoniwa.jp/page/product.html'
    res = requests.get(url)

    soup = BeautifulSoup(res.content, 'html.parser')
    p = soup.find_all('p', class_='mb24')[-1]

    return p.text


if __name__ == '__main__':

    coy = cotohappy.API()

    """ getting parse """
    print('\n#### parse origin ####')
    sentence = get_kotonoha_story()
    kuzure   = False
    parse_li = coy.parse(sentence, kuzure)
    for parse in parse_li:
        print(parse)

    print(parse.key_name)

    """ getting tokens; it is a little more difficult than MeCab Janome """
    print('\n#### parse tokens ####')
    for parse in parse_li:
        for token in parse.tokens:
            print(token)

    print(token.key_name)

    """ if you extract just nouns, you write: """
    print('\n#### extract nouns ####')
    nouns: [str] = []
    for parse in parse_li:
        for token in parse.tokens:
            if token.pos == '名詞':
                nouns.append(token.form)

    print(nouns)

Output:

#### parse origin ####
靴職人を	 0,1,D,1,2
目指す	 1,2,D,0,1
高校生・タカオは、	 2,51,D,2,3
雨の	 3,4,D,0,1
朝は	 4,7,D,0,1
...
互いの	 47,48,D,0,1
思いを	 48,51,D,0,1
よそに	 49,51,D,0,1
梅雨は	 50,51,D,0,1
明けようとしていた。	 51,-1,O,0,6
form	 id,head,dep,chunk_head,chunk_func

#### parse tokens ####
靴	 0,クツ,靴,名詞,*,*,*,*,*
職人	 1,ショクニン,職人,名詞,*,*,*,*,*
を	 2,ヲ,を,格助詞,連用,*,*,*,*
目指	 3,メザ,目指す,動詞語幹,S,*,*,*,*
す	 4,ス,す,動詞接尾辞,連体,*,*,*,*
...
し	 148,シ,し,動詞活用語尾,*,*,*,*,*
て	 149,テ,て,動詞接尾辞,接続,連用,*,*,*
い	 150,イ,いる,動詞語幹,A,Lて連用,*,*,*
た	 151,タ,た,動詞接尾辞,終止,*,*,*,*
。	 152,,。,句点,*,*,*,*,*
form	 id,kana,lemma,pos,features[:5]

#### extract nouns ####
['靴', '職人', '高校生', 'タカオ', '雨', '朝', '学校', '公園', '日本', '庭園', '靴', 'スケッチ', 'ある日', 'タカオ', 'ひとり', '缶', 'ビール', '年上', '女性', 'ユキノ', 'ふたり', '約束', '雨', '日', '逢瀬', '心', '居場所', 'ユキノ', '彼女', '靴', 'タカオ', '六月', '空', '揺れ', '互い', '思い', 'よそ', '梅雨']

Please check details on examples.

Whats's new?

0.4.1, 0.4.2

Partial errors elimination.

0.4.0

kuzure and default become kuzure=True and kuzure=False

0.3.6, 0.3.7

Partial errors elimination.

0.3.5

In version 0.3.5, you can choose translating mode: for example, "information-seeking" in sentence type, to "情報獲得".

0.3.4

In version 0.3.4, you can use technical term dictionaries on parse, named entity extraction, keyword extraction and similarity calculation. However, I, origin master of CotohapPy, cannot use nor examine the mode because I use Cotoha API for Developer, not for Enterprise. I want for Academic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cotohappy-0.4.2-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file cotohappy-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: cotohappy-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.8.0

File hashes

Hashes for cotohappy-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f47ea3fb9e7d590cdd96b8dc5bae5e1438becf4ea6060c0ee6d19c2ec5fa0332
MD5 43bf3a17c6ab77784961f31aa81d5b4d
BLAKE2b-256 b4001ecdb4d077d21f3a1c8a28af23c04cbea436e394db6b39e7977d2a5a4918

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page