Skip to main content

A Cython wrapper for MeCab

Project description

Current PyPI packages

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab.

See the blog post for background on why Fugashi exists and some of the design decisions.

Any reasonable version of MeCab should work, but it's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger.parseToNodeList(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you want to use MeCab but don't have a C compiler, use natto-py.
  • If you don't want to deal with installing MeCab at all, try SudachiPy.

Note that these are both slower than Fugashi according to a benchmark I wrote.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for fugashi, version 0.1.11
Filename, size File type Python version Upload date Hashes
Filename, size fugashi-0.1.11-cp35-cp35m-manylinux1_x86_64.whl (465.9 kB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size fugashi-0.1.11-cp35-cp35m-win_amd64.whl (495.3 kB) File type Wheel Python version cp35 Upload date Hashes View
Filename, size fugashi-0.1.11-cp36-cp36m-manylinux1_x86_64.whl (470.4 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size fugashi-0.1.11-cp36-cp36m-win_amd64.whl (496.3 kB) File type Wheel Python version cp36 Upload date Hashes View
Filename, size fugashi-0.1.11-cp37-cp37m-manylinux1_x86_64.whl (469.1 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size fugashi-0.1.11-cp37-cp37m-win_amd64.whl (496.3 kB) File type Wheel Python version cp37 Upload date Hashes View
Filename, size fugashi-0.1.11-cp38-cp38-manylinux1_x86_64.whl (472.7 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size fugashi-0.1.11-cp38-cp38-win_amd64.whl (497.3 kB) File type Wheel Python version cp38 Upload date Hashes View
Filename, size fugashi-0.1.11.tar.gz (331.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page