Skip to main content

A Cython MeCab wrapper for fast, pythonic Japanese tokenization.

Project description

Current PyPI packages Test Status PyPI - Downloads Supported Platforms

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install.

issueを英語で書く必要はありません。

See the blog post for background on why Fugashi exists and some of the design decisions.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

Fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

License and Copyright Notice

Fugashi is released under the terms of the MIT license. Please copy it far and wide.

Fugashi is a wrapper for MeCab, and Fugashi wheels include MeCab binaries. MeCab is copyrighted free software by Taku Kudo <taku@chasen.org> and Nippon Telegraph and Telephone Corporation, and is redistributed under the BSD License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-1.0.2a7-cp38-cp38-win_amd64.whl (500.1 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-1.0.2a7-cp38-cp38-manylinux1_x86_64.whl (487.0 kB view details)

Uploaded CPython 3.8

fugashi-1.0.2a7-cp38-cp38-macosx_10_14_x86_64.whl (283.0 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

fugashi-1.0.2a7-cp37-cp37m-win_amd64.whl (499.2 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-1.0.2a7-cp37-cp37m-manylinux1_x86_64.whl (477.1 kB view details)

Uploaded CPython 3.7m

fugashi-1.0.2a7-cp37-cp37m-macosx_10_14_x86_64.whl (282.3 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

fugashi-1.0.2a7-cp36-cp36m-win_amd64.whl (499.0 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-1.0.2a7-cp36-cp36m-manylinux1_x86_64.whl (476.6 kB view details)

Uploaded CPython 3.6m

fugashi-1.0.2a7-cp36-cp36m-macosx_10_14_x86_64.whl (283.2 kB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

fugashi-1.0.2a7-cp35-cp35m-win_amd64.whl (497.6 kB view details)

Uploaded CPython 3.5mWindows x86-64

fugashi-1.0.2a7-cp35-cp35m-manylinux1_x86_64.whl (473.1 kB view details)

Uploaded CPython 3.5m

fugashi-1.0.2a7-cp35-cp35m-macosx_10_14_x86_64.whl (281.0 kB view details)

Uploaded CPython 3.5mmacOS 10.14+ x86-64

File details

Details for the file fugashi-1.0.2a7-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 500.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 9690acf20bb6b0b68cfc8a19f058a8aa0f91780e3b1ac0ace32719e03c16f6c5
MD5 bf40f2d81e8e71f4a7eb9d38ab66a42d
BLAKE2b-256 496349cc46d61d071c6be94c41f71ec24f97ffc18fa7963dea17374654539efc

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 487.0 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2a2ab5d2d496404fdce6fe168acd6fb42a2448d30e89de474d2965357c8c1361
MD5 0591dd74ff00cb6ac3c4c480520acfca
BLAKE2b-256 f0e6cfbe17ed6e8b3908b292add97cdd7418947aeaf244240b070d92b366101e

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 283.0 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 8b75572c21bb29ed66d35a9b704c536b99d68ad7cd734aafe05b8f93438cb890
MD5 8692fe3357aed4251376669874e9319a
BLAKE2b-256 97b8b1a7256327f920a99470884837cd356f6e0048213ce9fc5303fd02ba3ebe

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 499.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.8

File hashes

Hashes for fugashi-1.0.2a7-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 15e8903220f55120d133b06c5c054d2048c53ea7079b3df0b70a92ddbe058aa6
MD5 7e27e5ec9ce051611811949ae67d0f83
BLAKE2b-256 cea66337db30aff5a1f9cb7c6ee8109e485c98b67681175cca138dd339602295

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 477.1 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9011a958f8e4eb77ce07a7caa2ffdaf526129ba4484be1fac968c6a2be24d9a1
MD5 eb39c4078b350e9743e84fc2af21741a
BLAKE2b-256 fb649c0876680ae3830c7d02aaa7f8b482a45e23a993492862865efc098ac6da

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 282.3 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.8

File hashes

Hashes for fugashi-1.0.2a7-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 262f76c7cb47ceb3850bce46f1081333acb2312c16b9a6f6973392b588feba52
MD5 42b0b15ad19e89ffd9b207a870ebf5dc
BLAKE2b-256 d3d5599137048f53198d0b2be8d85866d8beb78fcb96e50cb5377fe9dcede045

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 499.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.8

File hashes

Hashes for fugashi-1.0.2a7-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 34f3c9447368f0711258ca96514277025928f87b3ac3733f448d8274d3a50b31
MD5 2bf0d01d057546bef51411a16726a4bc
BLAKE2b-256 c3a303cf11ab1b7882a77f4a77dd29472aee146eb904db42bf1f03aa47efe6a9

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 476.6 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 687563e401a2b046c5d4cc09b9fc5716a0b7ef43b51600148fc3425a5f2461f8
MD5 759103378eb1815a2b8a169eed22f890
BLAKE2b-256 8b189d9affb1a3f273017a83b0ee605efaa17fe4f5fe64634b2ba3d3bc7daf58

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 283.2 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.6.11

File hashes

Hashes for fugashi-1.0.2a7-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6ace907b34d68f66181cdcff215a92ec4eceecab633f3bccc21aa7a29c541a8d
MD5 db5d79739dbb83539e57068cc47119ea
BLAKE2b-256 6756974640c55acc2be9853100c09a833102055a798830242f1da54a15cec70f

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 497.6 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.5.4

File hashes

Hashes for fugashi-1.0.2a7-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 1b2ec1e69fcc2d32a142c913acc4bd77c49eadfa4971faded63281d4953b0a5d
MD5 a8d4b40d2a32690c3be089bac2060e77
BLAKE2b-256 78cf3a471bccdb55660cb41ccf2a7cf19b68654eac471f3716cd699495fba663

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 473.1 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for fugashi-1.0.2a7-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 129d6d68d45ba25ced4bf7a8983a1904ff60aa8b4044e1a1720888a05f63b3e4
MD5 d88dea6d5970e27ebd2abf5d76249b63
BLAKE2b-256 f78d7af3f3fa3484fbcda52949aefa41d1db45c92616d9f1b0a7e111a86bf28c

See more details on using hashes here.

File details

Details for the file fugashi-1.0.2a7-cp35-cp35m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.0.2a7-cp35-cp35m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 281.0 kB
  • Tags: CPython 3.5m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.5.9

File hashes

Hashes for fugashi-1.0.2a7-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 77271672275df6c9d269d69f3b95235b26ae8d6dbfb93587492309cefca62a28
MD5 cec421830b4dc4097d799a9abdbc39fc
BLAKE2b-256 818ccc7e382a5a0ddfc90744950555fdea8307aaebb5f8831ad44fad14084070

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page