Python wrapper for the MeCab morphological analyzer for Japanese
Project description
This is a Python wrapper for the MeCab morphological analyzer for Japanese text. It works with Python 3.6 and greater, as well as Python 2.7.
issueを英語で書く必要はありません。
Note that Windows wheels require a Microsoft Visual C++ Redistributable, so be sure to install that.
Basic usage
>>> import MeCab
>>> wakati = MeCab.Tagger("-Owakati")
>>> wakati.parse("pythonが大好きです").split()
['python', 'が', '大好き', 'です']
>>> tagger = MeCab.Tagger()
>>> print(tagger.parse("pythonが大好きです"))
python python python python 名詞-普通名詞-一般
が ガ ガ が 助詞-格助詞
大好き ダイスキ ダイスキ 大好き 形状詞-一般
です デス デス です 助動詞 助動詞-デス 終止形-一般
EOS
The API for mecab-python3
closely follows the API for MeCab itself,
even when this makes it not very “Pythonic.” Please consult the official MeCab
documentation for more information.
Installation
Binary wheels are available for MacOS X, Linux, and Windows (64bit) are
installed by default when you use pip
:
pip install mecab-python3
These wheels include an internal (statically linked) copy of the MeCab library,
but not dictionary. In order to use MeCab you'll need to install a dictionary.
unidic-lite
is a good one to start with:
pip install unidic-lite
To build from source using pip,
pip install --no-binary :all: mecab-python3
Common Issues
If you get a RuntimeError
when you try to run MeCab, here are some things to check:
Windows Redistributable
You have to install this to use this package on Windows.
Installing a Dictionary
Run pip install unidic-lite
and confirm that works. If that fixes your
problem, you either don't have a dictionary installed, or you need to specify
your dictionary path like this:
tagger = MeCab.Tagger('-r /dev/null -d /usr/local/lib/mecab/dic/mydic')
Note: on Windows, use nul
instead of /dev/null
. Alternately, if you have a
mecabrc
you can use the path after -r
.
Specifying a mecabrc
If you get this error:
error message: [ifs] no such file or directory: /usr/local/etc/mecabrc
You need to specify a mecabrc
file. It's OK to specify an empty file, it just
has to exist. You can specify a mecabrc
with -r
. This may be necessary on
Debian or Ubuntu, where the mecabrc
is in /etc/mecabrc
.
You can specify an empty mecabrc
like this:
tagger = MeCab.Tagger('-r/dev/null -d/home/hoge/mydic')
Using Unsupported Output Modes like -Ochasen
Chasen output is not a built-in feature of MeCab, you must specify it in your
dicrc
or mecabrc
. Notably, Unidic does not include Chasen output format.
Please see the MeCab documentation.
Alternatives
- fugashi is a Cython wrapper for MeCab with a Pythonic interface, by the current maintainer of this libray
- SudachiPy is a modern tokenizer with a maintained dictionary, though it's slower than MeCab
- KoNLPy is a library for Korean NLP that includes a MeCab wrapper
Licensing
Like MeCab itself, mecab-python3
is copyrighted free software by
Taku Kudo taku@chasen.org and Nippon Telegraph and Telephone Corporation,
and is distributed under a 3-clause BSD license (see the file BSD
).
Alternatively, it may be redistributed under the terms of the
GNU General Public License, version 2 (see the file GPL
) or the
GNU Lesser General Public License, version 2.1 (see the file LGPL
).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for mecab_python3-1.0.3a2-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49a287f18d109054a3c17d09b0cad04e737c1ce7e518f1f4c0b610cfede3ace4 |
|
MD5 | fe0cca8b6f9812fa020465029238e370 |
|
BLAKE2b-256 | 7ce48c1b21a6268ffc3f9385c7dc34546c604bf4d50e5bd7ae1df9aa9a77ebf6 |
Hashes for mecab_python3-1.0.3a2-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f00f1b3d778e571e2f6dbdc4b7fa16e694ee713f639b2e40b0aaae21b9f08c08 |
|
MD5 | 1d3d67e6c1582194ad340643c7b49044 |
|
BLAKE2b-256 | 40f4edc52d541eb03cc34ee982bdbd028b3a18bdfc39ae401ccfe864f738362f |
Hashes for mecab_python3-1.0.3a2-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d95656ad94826a9cd1e030b026230ee449028b1fd3d33f5669086af50e8880b0 |
|
MD5 | acb88a581fcae643412e79b49450c529 |
|
BLAKE2b-256 | 52e6eab15a06881f065dd630dd012bb791f11eb4ccb86e92aa49481f371959ce |
Hashes for mecab_python3-1.0.3a2-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b1c6be78ee2d216c6ddbe5798026273212b8e33f67f4050d3d3b558051d4269 |
|
MD5 | 718588ff3d4add44b33f0cdd08481592 |
|
BLAKE2b-256 | 6a40046aadfcb5c228439fb866a984d48174e333dbf1b6e15282d77130a68dd9 |
Hashes for mecab_python3-1.0.3a2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7d29828ca4ca85ca6e989dc351c499c14e81930f120494e2350f50d87cc66d4 |
|
MD5 | 586c9cccff0d23e6c1d6c38fef8bc9dc |
|
BLAKE2b-256 | 1690d5618fd16150823bdb1d5c7a3ef066cf4d2bd5d07730d932667a77b2ea2e |
Hashes for mecab_python3-1.0.3a2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f6edfff6fd913c41428aee6a5e6e1459d922d6727dced7fd8af23ebd90a6923 |
|
MD5 | 7d398f268696e1176d0df8a482418270 |
|
BLAKE2b-256 | 1823659c696f3167709bcc84ab74d004f2b3e719fac593a6204f136e48e21b0c |
Hashes for mecab_python3-1.0.3a2-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73ee4c07c74993e24387f22e859b0ff19d6abc6f3a2cecbaec4bed78712127a1 |
|
MD5 | 7a7978f0157945958a15a79a93efcbf5 |
|
BLAKE2b-256 | 9e97dd2563c60b34410590195f67f89c5827e32a9e8b92304765716c4ba02bde |
Hashes for mecab_python3-1.0.3a2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa06c42117fe6bff38f7e513d54e81d361bbfc092a9644b59b048e1480b3428a |
|
MD5 | f01fbffccbe11349a7e74f76dde71fec |
|
BLAKE2b-256 | a654927e5fb3f7ca057799a0c2b0908f686accf70e9a5bdcc99d0e8fdb670512 |
Hashes for mecab_python3-1.0.3a2-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 020311dc42980d53019f633bac03fdc65688f9c4be9b9ccf9bc0f5c772c70024 |
|
MD5 | 8e613e87fa15bb447bada558e6805ed0 |
|
BLAKE2b-256 | c8fb14a517b68668d284c234f83d071ecfd79b902482be25ccbe52cd21851d22 |
Hashes for mecab_python3-1.0.3a2-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4f97fa28886c2d5d1ad683715d64904c1fef4a734e37ce72abe7fe2fa4afd91 |
|
MD5 | 228b73d58abf72948905975ef6fa98ea |
|
BLAKE2b-256 | 05aa1673c64816ba30899048c1ac30d78ec6f85437710356dadaa151cb2841be |
Hashes for mecab_python3-1.0.3a2-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a73f91ea0ab97cc29027fe2419b52dbe51f1bbb33d3eef57dbd062edbb852b7 |
|
MD5 | c76029509587428863e4ce26640f9eb5 |
|
BLAKE2b-256 | b2888f03a55291db01d677bfd3b68d73d5d98bd9b89b7b6a351defbf93ff42de |
Hashes for mecab_python3-1.0.3a2-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 644775861edd14a88b776a940b37a66be42ea63559fd47e93fc95bad2d95b459 |
|
MD5 | 511d635bdcceab3e0e2ed09674a2f728 |
|
BLAKE2b-256 | 81054cc119f72a6c510fdaaa5eb4d86ec3f2e6f96823471a94382a2ab91ffcbe |