Skip to main content

Python wrapper for the MeCab morphological analyzer for Japanese

Project description

This is a Python wrapper for the MeCab morphological analyzer for Japanese text. It works with Python 3.5 and greater, as well as Python 2.7. (Note: Python 3.5 is not supported on OSX, see this issue).

Note that Windows wheels require a Microsoft Visual C++ Redistributable, so be sure to install that.

Basic usage

>>> import MeCab
>>> wakati = MeCab.Tagger("-Owakati")
>>> wakati.parse("pythonが大好きです").split()
['python', 'が', '大好き', 'です']

>>> chasen = MeCab.Tagger("-Ochasen")
>>> print(chasen.parse("pythonが大好きです"))
python python  python 名詞-固有名詞-組織
          助詞-格助詞-一般
大好き ダイスキ 大好き 名詞-形容動詞語幹
です  デス   です  助動詞 特殊デス 基本形
EOS

The API for mecab-python3 closely follows the API for MeCab itself, even when this makes it not very “Pythonic.” Please consult the MeCab documentation for more information.

Installation

Binary wheels are available for MacOS X and Linux, and are installed by default when you use pip:

pip install mecab-python3

These wheels include an internal (statically linked) copy of the MeCab library, and a copy of the mecab-ipadic dictionary (using UTF-8 text encoding), which is automatically used by default. If you wish to use a different dictionary, you will need to install it yourself, write a mecabrc file directing MeCab to use it, and set the environment variable MECABRC to point to this file.

To build from source using pip,

pip install --no-binary :all: mecab-python3

Alternatively, you can use pip to download the source, then build it by hand:

pip download --no-binary :all: mecab-python3
tar zxf mecab-python3-{version}.tar.gz
cd mecab-python3-{version}
python3 setup.py build
# install as you like

When the module is built from source, it requires the system to provide the MeCab library and at least one dictionary. You must have SWIG, the MeCab library and headers, and a dictionary installed before running pip install or setup.py build. For instance, on Debian-based Linux,

sudo apt-get install swig libmecab-dev mecab-ipadic-utf8

Building wheels with a bundled library and dictionary is only supported in a sanitized CI environment. Consult the scripts in the scripts subdirectory of the source tree to see how it’s done.

Licensing

Like MeCab itself, mecab-python3 is copyrighted free software by Taku Kudo taku@chasen.org and Nippon Telegraph and Telephone Corporation, and is distributed under a 3-clause BSD license (see the file BSD). Alternatively, it may be redistributed under the terms of the GNU General Public License, version 2 (see the file GPL) or the GNU Lesser General Public License, version 2.1 (see the file LGPL).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mecab_python3-0.996.6rc3-cp38-cp38-win_amd64.whl (508.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

mecab_python3-0.996.6rc3-cp37-cp37m-win_amd64.whl (507.9 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

mecab_python3-0.996.6rc3-cp36-cp36m-win_amd64.whl (507.9 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

mecab_python3-0.996.6rc3-cp35-cp35m-win_amd64.whl (507.9 kB view hashes)

Uploaded CPython 3.5m Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page