Skip to main content

Converts MeCab parsing results to Python objects.

Project description

YouCab: Converts MeCab Parsing Results to Python Objects

PyPI Version Python Versions License Code style: black Imports: isort

Installation

Install MeCab

MeCab is required for YouCab to work. If it is not already installed, install MeCab first.

Install YouCab

$ pip install youcab

Tokenize Japanese sentence

In this example code, we generate a tokenizer with MeCab's default dictionary and run tokenization. The tokenizer converts text into a list of Word objects.

from youcab import youcab

tokenize = youcab.generate_tokenizer()
words = tokenize("本を読んだ")
for word in words:
    print("surface: " + word.surface)
    print("pos    : " + str(word.pos))
    print("base   : " + word.base)
    print("c_type : " + word.c_type)
    print("c_form : " + word.c_form)
    print("")
surface: 本
pos    : ['名詞', '一般']
base   : 本
c_type : 
c_form : 

surface: を
pos    : ['助詞', '格助詞', '一般']
base   : を
c_type : 
c_form : 

surface: 読ん
pos    : ['動詞', '自立']
base   : 読む
c_type : 五段・マ行
c_form : 連用タ接続

surface: だ
pos    : ['助動詞']
base   : だ
c_type : 特殊・タ
c_form : 基本形

Available for any MeCab dictionary

Dictionaries such as IPAdic, UniDic and neologd are available.

from youcab import youcab

tokenize = youcab.generate_tokenizer(dicdir="/path/to/mecab/dic/dir/")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

youcab-0.1.3.tar.gz (7.3 kB view hashes)

Uploaded Source

Built Distribution

youcab-0.1.3-py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page