A Cython wrapper for MeCab
Project description
fugashi
Fugashi is a Cython wrapper for MeCab.
See the blog post for background on why Fugashi exists and some of the design decisions.
Any reasonable version of MeCab should work, but it's recommended you install from source.
Usage
from fugashi import Tagger
tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger.parseToNodeList(text):
print(word, word.feature.lemma, word.pos, sep='\t')
# "feature" is the Unidic feature data as a named tuple
Dictionary Use
Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.
If you're using a dictionary besides Unidic you can use the GenericTagger like this:
from fugashi import GenericTagger
tagger = GenericTagger()
# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger.parseToNodeList(text):
print(word.surface, word.feature[0])
You can also create a dictionary wrapper to get feature information as a named tuple.
from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
print(word.surface, word.feature.alpha)
Alternatives
If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.
- If you want to use MeCab but don't have a C compiler, use natto-py.
- If you don't want to deal with installing MeCab at all, try SudachiPy.
Note that these are both slower than Fugashi according to a benchmark I wrote.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.