Skip to main content

A Cython MeCab wrapper for fast, python Japanese tokenization.

Project description

Current PyPI packages

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install.

See the blog post for background on why Fugashi exists and some of the design decisions.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

Fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you want to use MeCab on a platform we don't have wheels for, but don't have a C compiler, use natto-py.
  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

License and Copyright Notice

Fugashi is released under the terms of the MIT license. Please copy it far and wide.

Fugashi is a wrapper for MeCab, and Fugashi wheels include MeCab binaries. MeCab is copyrighted free software by Taku Kudo <taku@chasen.org> and Nippon Telegraph and Telephone Corporation, and is redistributed under the BSD License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fugashi-1.0.0.tar.gz (334.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-1.0.0-cp38-cp38-win_amd64.whl (498.1 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-1.0.0-cp38-cp38-manylinux1_x86_64.whl (479.0 kB view details)

Uploaded CPython 3.8

fugashi-1.0.0-cp38-cp38-macosx_10_14_x86_64.whl (280.0 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

fugashi-1.0.0-cp37-cp37m-win_amd64.whl (497.1 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-1.0.0-cp37-cp37m-manylinux1_x86_64.whl (467.0 kB view details)

Uploaded CPython 3.7m

fugashi-1.0.0-cp37-cp37m-macosx_10_14_x86_64.whl (279.1 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

fugashi-1.0.0-cp36-cp36m-win_amd64.whl (497.1 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (466.8 kB view details)

Uploaded CPython 3.6m

fugashi-1.0.0-cp36-cp36m-macosx_10_14_x86_64.whl (280.0 kB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

fugashi-1.0.0-cp35-cp35m-win_amd64.whl (495.9 kB view details)

Uploaded CPython 3.5mWindows x86-64

fugashi-1.0.0-cp35-cp35m-manylinux1_x86_64.whl (463.0 kB view details)

Uploaded CPython 3.5m

fugashi-1.0.0-cp35-cp35m-macosx_10_14_x86_64.whl (278.1 kB view details)

Uploaded CPython 3.5mmacOS 10.14+ x86-64

File details

Details for the file fugashi-1.0.0.tar.gz.

File metadata

  • Download URL: fugashi-1.0.0.tar.gz
  • Upload date:
  • Size: 334.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3f914ec09f902fd0bcfe704a96a69fa44cdabce543dbdcc812bf05e5584e2141
MD5 6927449da7365331279a4f957b93faba
BLAKE2b-256 a8657b994d9c45d701204ed45bef0bc5826cfcceae76794fa0afb07e988df222

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 498.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 083f3fc120374d7c4cd0be29ef3336f729cc5fcf289b20e97863d9b1b9c4fdd4
MD5 6680639019cbfd3e3bc9db0b0119bd59
BLAKE2b-256 24d406123affb869f2f32b1a9417fc6d0b335622420ae58fb50462b147a76bfe

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 479.0 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 574c821a6487310bf48c0d6ef278c0888a5ea0c58d73695925b556f69c002efc
MD5 14588605d18781d6162c2fb01f0b88ca
BLAKE2b-256 9f2dc3bd0b46adac9d7d8c2d668a4d5a7741493386954893a32cf6ea0e38cb38

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 280.0 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 36335ab7af7c3f0f7253100545f1588c329bddbd0005b94d0104815f0ff6fdf6
MD5 19d898749d62d8c966cbcf1ead9b12f5
BLAKE2b-256 0c025391e7ba36c64b82cf40d3ce8793d05e1b59d5a3e1b64eae177ce4a13eb9

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 497.1 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 e1d27f1acf5d7f03f871556f442cc0463c858cb35ec7de1f4e0c80b054eb222a
MD5 786c23eb9611d85443b4779a27d1d8ca
BLAKE2b-256 281eea0ca83413b282063fae9293cbd642735f256e0318bba7b3cd52c6fb678d

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 467.0 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 031ab2cdbcaa868b9eda9e7e8f25123d9f6c9eef0d10afddc97104576d28198c
MD5 696e2e646f9aabc943eb9775b83c6f5e
BLAKE2b-256 c4af7391473aec07b25628063cf1383eab02c5a091ad50ed09345518a27d9fe8

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 279.1 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 347c1a70699113b83f7a5fba7452bee842e9e397188060d8790d5614f1b86463
MD5 cee87eb30b44ce71ff9552ba424bafe7
BLAKE2b-256 3af806c8342656c0bbad0236f5eb61ad0b83e6627f7eacdef11787e7305ec434

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 497.1 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 3b4e394419696ceb654d71889f178fc58237d03ffb9cdc35deb816846238a5b1
MD5 7a6fae0f541236ae5e1f3cf92d00b2b9
BLAKE2b-256 0e03101780c5660254353e077034d330cb34f293adc508da789bca857bd86416

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 466.8 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e9d0adceb2985f4accbbc6c63789ff50b370e51dc5c15ccb0a3827c58057e574
MD5 5c614dbb2a8b1e70b6b31a1440d87dc2
BLAKE2b-256 fe7762b7e1c7d3d46b40a9232642ace52dc3b396cd67c81be8f8f26cbd41a666

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 280.0 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6fa51cbfef17a0993ea5a238998d7b1a681e3f86ecc46a5d41adf2dd3f68ff69
MD5 1f6632d154a16b3b99a0986bff0a23c7
BLAKE2b-256 55aa5d404b3f25b051261ac0f10b343ca6cc785363317803915bdaf968cd8e8b

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 495.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 ab342a827b3f29557be98a4f6243f98912d4a45f3be5728d674b3fe57ec002b6
MD5 4755e380101363fd5330e75420309656
BLAKE2b-256 6d80e5779aa3e43955cc810ab6a57ce2e530bac745624821695d82019e5ec1d4

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 463.0 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cb8de5e5d269699901e356e52db50e9bcd05fbcb4284d946a478717b131a785f
MD5 a7b7d9658819ab53bc82ba5bd88eb27b
BLAKE2b-256 57b2314fb8196448d3cf933c1e91a53ce90ed329e08005bb79b7d27ef2ac9ac2

See more details on using hashes here.

File details

Details for the file fugashi-1.0.0-cp35-cp35m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.0-cp35-cp35m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 278.1 kB
  • Tags: CPython 3.5m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.0-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 dc4c7f38a490e400d2302579e20034fd70bdb7d02056641cf488817cf4a0ffde
MD5 937d7bbe2ea6edb900d56fdc8228b31d
BLAKE2b-256 3ba49f9ce54a082ab0df45d12ce39a26849393616835271678fc7ac4a3eec321

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page