Skip to main content

A Cython MeCab wrapper for fast, pythonic Japanese tokenization.

Project description

Open in Streamlit Current PyPI packages Test Status PyPI - Downloads Supported Platforms

fugashi

fugashi by Irasutoya

fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install.

issueを英語で書く必要はありません。

Check out the interactive demo, see the blog post for background on why fugashi exists and some of the design decisions, or see this guide for a basic introduction to Japanese tokenization.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

For more information on the different MeCab dictionaries available, see this article.

Dictionary Use

fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Citation

If you use fugashi in research, it would be appreciated if you cite this paper. You can read it at the ACL Anthology or on Arxiv.

@inproceedings{mccann-2020-fugashi,
    title = "fugashi, a Tool for Tokenizing {J}apanese in Python",
    author = "McCann, Paul",
    booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlposs-1.7",
    pages = "44--51",
    abstract = "Recent years have seen an increase in the number of large-scale multilingual NLP projects. However, even in such projects, languages with special processing requirements are often excluded. One such language is Japanese. Japanese is written without spaces, tokenization is non-trivial, and while high quality open source tokenizers exist they can be hard to use and lack English documentation. This paper introduces fugashi, a MeCab wrapper for Python, and gives an introduction to tokenizing Japanese.",
}

Alternatives

If you have a problem with fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

License and Copyright Notice

fugashi is released under the terms of the MIT license. Please copy it far and wide.

fugashi is a wrapper for MeCab, and fugashi wheels include MeCab binaries. MeCab is copyrighted free software by Taku Kudo <taku@chasen.org> and Nippon Telegraph and Telephone Corporation, and is redistributed under the BSD License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-1.1.2a1-cp39-cp39-win_amd64.whl (503.0 kB view details)

Uploaded CPython 3.9Windows x86-64

fugashi-1.1.2a1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (487.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.5+ x86-64

fugashi-1.1.2a1-cp39-cp39-macosx_10_14_x86_64.whl (285.7 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

fugashi-1.1.2a1-cp38-cp38-win_amd64.whl (503.0 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-1.1.2a1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (495.4 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.5+ x86-64

fugashi-1.1.2a1-cp38-cp38-macosx_10_14_x86_64.whl (284.6 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

fugashi-1.1.2a1-cp37-cp37m-win_amd64.whl (501.8 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-1.1.2a1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (491.2 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.5+ x86-64

fugashi-1.1.2a1-cp37-cp37m-macosx_10_14_x86_64.whl (283.9 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

fugashi-1.1.2a1-cp36-cp36m-win_amd64.whl (501.7 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-1.1.2a1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (491.4 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.5+ x86-64

fugashi-1.1.2a1-cp36-cp36m-macosx_10_14_x86_64.whl (283.8 kB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

fugashi-1.1.2a1-cp35-cp35m-win_amd64.whl (500.4 kB view details)

Uploaded CPython 3.5mWindows x86-64

fugashi-1.1.2a1-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (486.8 kB view details)

Uploaded CPython 3.5mmanylinux: glibc 2.5+ x86-64

fugashi-1.1.2a1-cp35-cp35m-macosx_10_14_x86_64.whl (282.5 kB view details)

Uploaded CPython 3.5mmacOS 10.14+ x86-64

File details

Details for the file fugashi-1.1.2a1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 503.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for fugashi-1.1.2a1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 53cd204ecbbc9ca33baa13ffa703a27f3f1fd6d6a270ec48fbac6f57065625d6
MD5 f540bf10b6d655d6cecc588102a3ad3a
BLAKE2b-256 d2faf2ef84522682f3fb26b00e642ee8bfc1b8ae2ad8007c6c2bd7bf28f559a9

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
  • Upload date:
  • Size: 487.0 kB
  • Tags: CPython 3.9, manylinux: glibc 2.5+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for fugashi-1.1.2a1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c1cfa21eb3ec0726221158da8dcad126ba4d85ed5f38548f5df0177b2b9aab77
MD5 f1d236e2b79e6fbb1af6c577b716049d
BLAKE2b-256 a40e1846119dd8819b90209fc3aea444828cf23f46fea3d7bf176a23d6facd9d

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 285.7 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for fugashi-1.1.2a1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 eaffe3dbefea01f0b5bd52f27ba1355ee2e169ee2d7f9d52ca63e263a0f58e38
MD5 9d3a73f2f8c4e2131b73b692c7d47def
BLAKE2b-256 cd6aa5d3c0e50fac20f05550f271de27dcb4f93d8be711c8494faf98dbd2fd9e

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 503.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for fugashi-1.1.2a1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ab201ddc0d1ecf8110d98e3d843ba61d647bc791843d5385e74786c4b9c39fab
MD5 35540db1287732fdc4811c75ad70156b
BLAKE2b-256 a83d383905a6e251a02ca76c628411a2450e3f298a63b4d9992e9e670a6c8f8d

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
  • Upload date:
  • Size: 495.4 kB
  • Tags: CPython 3.8, manylinux: glibc 2.5+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for fugashi-1.1.2a1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d7bbd090938e66808db7b526aed84d3e72bbcf6778173f1ce33799329f4db429
MD5 2b2f8f89dacad428b262b1543358b36f
BLAKE2b-256 54804e3191cd0a696506b06d7746f605093ddabafbe38a88e5208ea0226638e6

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 284.6 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for fugashi-1.1.2a1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6a45af3042db3eac858d1f9ca69d8613efbe7edec5f842253817d9459e13ab12
MD5 94441d6125272b8741a5a20e8516aa58
BLAKE2b-256 f0afec6addb9b1f3718b2194835ead18fe032693b9d3c3456b79de1a929b5722

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 501.8 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for fugashi-1.1.2a1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d991e2e33c972e6e4fc7d8bcac3ff7d5e4ae55027c9b81aff4da7375bb8e918c
MD5 c1a4cb8fdd16c99b46ef1661a6019b7a
BLAKE2b-256 94f1a4e07971b2614802d1991ee842e360e97d8554d29a3c33ec461de6839cac

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5a52963b2b7721c0112c886a29bf39502320b839cb5f83eba440e950c3c27fe3
MD5 6bf75354e3d3d148ac1f50f0b6c36e60
BLAKE2b-256 03700e5bc2a518ab3d098bf59a9d681fa693356761a74c705161a8f6d679ae42

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 283.9 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for fugashi-1.1.2a1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d94a1fa012aea964e1fa2ab2183b09519fffca84a5143f80ee9a212c0aeac512
MD5 addb4577da5bdd131a9b4f25f818315e
BLAKE2b-256 fa484b560669fd55a49b88caaa2db12b9ce033e9d207e712e2b5a516ba03e44d

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 501.7 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for fugashi-1.1.2a1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 05ba5833e97a5d7122c4e57176afdd6d139f8770cdf7fba6c44a2c5e6a419bd0
MD5 a799ae705766acb7925e3c7b07b349ff
BLAKE2b-256 9d0db9fc8c2a2497ead199c3300d95d7c2ff094e6a31e68592d58591ad31ccf3

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4dd64ebc82b52245f0d0688e73a80fec78223ba9520761cd27903174b449d66d
MD5 624dcd29f949d020c0bab6654747c16e
BLAKE2b-256 788c1f93489883f426cef4348011eee31bbbf3319a87cc1e406ad9d27c030e7a

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 283.8 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.15

File hashes

Hashes for fugashi-1.1.2a1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e02808f5bfbe7e4e2d847e4a32c7d5aed6bce8afeb8d85a4b6e17bb4a42bdde3
MD5 142c1d2d770cd17ae7e961aa1dd91559
BLAKE2b-256 59ecee3111cda1c731432a18253c7bdfc22fb2cc07582978cc1680791cd78f63

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 500.4 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.7.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.5.4

File hashes

Hashes for fugashi-1.1.2a1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 2d3572339952416e62791aebe62c9eb507eb1da1c5e124ea1d91bd18c7ac5635
MD5 9a91ef49d52f5c22654532089fab0987
BLAKE2b-256 02c17cc85431d65536d5ce5e3be8f2af96a52e823efc6c4a9c55d994ad37b68c

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.1.2a1-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b3d02a8aac16ced56fa0beecf603f71e0003451424c83bcc44a4c0c5b5b9f715
MD5 52c005b537046086f0c0db5b691a7ff5
BLAKE2b-256 2afcc6abf8d0f54beb64523d7bb4cd0a461387306b041a52c41da1ec320ff2bb

See more details on using hashes here.

File details

Details for the file fugashi-1.1.2a1-cp35-cp35m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: fugashi-1.1.2a1-cp35-cp35m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 282.5 kB
  • Tags: CPython 3.5m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.7.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.5.10

File hashes

Hashes for fugashi-1.1.2a1-cp35-cp35m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 bd29e7761c76eedff4efc29e75736feda8f56f4d6772ffd612b4543df8827d85
MD5 6534e4c16d5370d7c405bae975de4939
BLAKE2b-256 c9cacdafb9b597fb30c998b421f11f49cf3e82664fa554ebe761076a6024d061

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page