A Cython wrapper for MeCab
Project description
fugashi
Fugashi is a Cython wrapper for MeCab. It doesn't attempt to cover all of the potential use cases of MeCab, instead dealing with only the most common ones.
- Only UniDic is supported, you can't use IPADic. UniDic Neologd is fine.
- Only UTF-8 is supported.
- Only Python3 is supported.
See the blog post for background on why Fugashi exists and some of the design decisions.
Usage
from fugashi import Tagger
tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger.parseToNodeList(text):
print(word, word.feature.lemma, word.pos, sep='\t')
# "feature" is the Unidic feature data as a named tuple
Alternatives
If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.
- If you want to use MeCab but don't have a C compiler, use natto-py.
- If you don't want to deal with installing MeCab at all, try SudachiPy.
Note that these are both slower than Fugashi according to a benchmark I wrote.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fugashi-0.1.6.tar.gz
(51.6 kB
view hashes)