Skip to main content

A Cython MeCab wrapper for fast, pythonic Japanese tokenization.

Project description

Open in Streamlit Current PyPI packages Test Status PyPI - Downloads Supported Platforms

fugashi

fugashi by Irasutoya

fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install.

issueを英語で書く必要はありません。

Check out the interactive demo, see the blog post for background on why fugashi exists and some of the design decisions, or see this guide for a basic introduction to Japanese tokenization.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

For more information on the different MeCab dictionaries available, see this article.

Dictionary Use

fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Citation

If you use fugashi in research, it would be appreciated if you cite this paper. You can read it at the ACL Anthology or on Arxiv.

@inproceedings{mccann-2020-fugashi,
    title = "fugashi, a Tool for Tokenizing {J}apanese in Python",
    author = "McCann, Paul",
    booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlposs-1.7",
    pages = "44--51",
    abstract = "Recent years have seen an increase in the number of large-scale multilingual NLP projects. However, even in such projects, languages with special processing requirements are often excluded. One such language is Japanese. Japanese is written without spaces, tokenization is non-trivial, and while high quality open source tokenizers exist they can be hard to use and lack English documentation. This paper introduces fugashi, a MeCab wrapper for Python, and gives an introduction to tokenizing Japanese.",
}

Alternatives

If you have a problem with fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

License and Copyright Notice

fugashi is released under the terms of the MIT license. Please copy it far and wide.

fugashi is a wrapper for MeCab, and fugashi wheels include MeCab binaries. MeCab is copyrighted free software by Taku Kudo <taku@chasen.org> and Nippon Telegraph and Telephone Corporation, and is redistributed under the BSD License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (597.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (586.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

fugashi-1.1.2a3-cp39-cp39-win_amd64.whl (503.0 kB view details)

Uploaded CPython 3.9Windows x86-64

fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (596.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (585.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

fugashi-1.1.2a3-cp38-cp38-win_amd64.whl (503.0 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (604.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (593.5 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

fugashi-1.1.2a3-cp37-cp37m-win_amd64.whl (501.8 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (568.1 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (556.1 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ ARM64

fugashi-1.1.2a3-cp36-cp36m-win_amd64.whl (501.7 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (567.9 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (555.5 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ ARM64

File details

Details for the file fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9b513b7155cf68ff5bdb5ff910d3ae5f90963241b5809984a21e27d74ffafa52
MD5 3e13b0f6179e872d77dccd14fc36c64e
BLAKE2b-256 3a3fcecda632bcfaa2ede81824f29dd8c8b93f6f9034b9e88610b7c4c16d8060

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c1531ccfa3e56211e64f26d5bc634b255963918a97bf6dc0a19b79bf24b2a0a0
MD5 026f8dc6e55756400695e89f7e135673
BLAKE2b-256 fa43cb0870c9b3ad9def2fec62747d9a9e554d2b6bccc56a6d1d77965bd68720

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 503.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for fugashi-1.1.2a3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 388715c4f9f1296f9d3c80a22065688cea807f75a59fb147625ef263b51dacf2
MD5 a112015810d0571ddc04ae666188f0cc
BLAKE2b-256 ca519e377324fb71ae9765b6cdee472961ba750af61958ef0ba317f83118be55

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b16ad17e8044658b8ac70f05d89344eb5d35d34863d7f30f52c0795c8869964
MD5 40ec314f2698fa6352663bbc55a9751b
BLAKE2b-256 f33cd6e701c015b2648cd38e0c073957afe650b4aa79dab2826bb8db8138de2b

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 47afcd0dc3b114f680c2621789e35ebdd7bdccea1bced7cf30979121e8d63497
MD5 81c393941ff1199e405481cd83e5457c
BLAKE2b-256 a1c70d8abb9a3f0707179783aa72500cb921d05d4dabb253e718345091b786d8

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 503.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for fugashi-1.1.2a3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 10979413e4d09c562eb8c711afb9682db3e5894b774de07b66f7cf5f92917932
MD5 bd5bcf8af3aba70e299457607c5fab5d
BLAKE2b-256 06b8e7588d4f7d6d9d29678a0f85f0e24b1b8a873876e5731af5cb62e92f4ba7

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3372f0ec462734fbf2b83cdea3f310774a5761747d1e7144010f1578783ee263
MD5 ad1fd0a33e3d3e1c4ee8fbb0132cd991
BLAKE2b-256 b3239355b024a7e524e3f17d4e1daf87a15229fe0782740d901d9ad194afc95b

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2937a8de390269bfee30d99066a769d653b97067761f284bf39006feb7e7e784
MD5 42cbbeee3b8baa1f88a16811b64ea734
BLAKE2b-256 a40f48d90ec6b79264a93b313cc9af22c63a5353b49737fa3fcdbd5b60991cd9

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a3-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 501.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for fugashi-1.1.2a3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b85bc926ab4b1608c416f27dea3ecd27b02c63a0c715707d1bca4a14650e3244
MD5 2ed00145846a509afb8c784b44b4815c
BLAKE2b-256 1373b9e0aef4bd329ce7e5008f4180b0bc0fdbde980270aabf4fcfdd578eba50

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e829c1b5135c13c4bd3b61ad8f6c648abf4f6ec1dbe8de2ab4f7bdd5fec96e28
MD5 c55a72c0c5f6fd097d65abfaf5d651f4
BLAKE2b-256 baddb7e598ae8f7369989c3fe1261740b0a75a620d4f8d58e3b788325c8f7fa7

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2ba5cd287d7fc61ca93efa60a0a9780a03c5c46688acf43b99c27a97217ca8b4
MD5 8d38f522de1ab7d543d0dfc0dda8502a
BLAKE2b-256 b05b73bbdf83b903ec0e69dd04f741e2669876b7336589850bcc566b52465250

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a3-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 501.7 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for fugashi-1.1.2a3-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 6ddbbc55373839705b755f7f42701b1b20cb8700369825ba0099181e2d44db3e
MD5 13c7ef4f582794657acf45b9396b5ded
BLAKE2b-256 2d1cb1bcab33f1ec0f4d8b073e6c7272fe02f8e3a5c46702f6ce0f0f2063bc4c

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e63cfa4a31df5160a9d1a94c2e326c23fe650ecdf01b10367ac1b0e38acb7a3d
MD5 ccf5af02ca89888568c098a006adf040
BLAKE2b-256 0cab94b82a0db0b8c384c87cf3525ca2656b712d3c2383edf955161baa03d112

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a3-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 808bd24fca37712678a5aad1f69d404c85cc9ac614ecb85b8d3739bbc78c94c7
MD5 10f5f20de7e4892f9f73f8be997cf9bd
BLAKE2b-256 0779d2dccd2bd961662af576bc493b9b949ab43df5c9e2960b81994e5932e74e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page