Skip to main content

A Cython wrapper for MeCab

Project description

Current PyPI packages

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install (see docs below).

See the blog post for background on why Fugashi exists and some of the design decisions.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

Fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you want to use MeCab on a platform we don't have wheels for, but don't have a C compiler, use natto-py.
  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fugashi-0.2.1.tar.gz (333.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-0.2.1-cp38-cp38-win_amd64.whl (500.1 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl (474.3 kB view details)

Uploaded CPython 3.8

fugashi-0.2.1-cp37-cp37m-win_amd64.whl (498.9 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl (466.4 kB view details)

Uploaded CPython 3.7m

fugashi-0.2.1-cp36-cp36m-win_amd64.whl (498.9 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl (467.2 kB view details)

Uploaded CPython 3.6m

fugashi-0.2.1-cp35-cp35m-win_amd64.whl (497.9 kB view details)

Uploaded CPython 3.5mWindows x86-64

fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl (462.3 kB view details)

Uploaded CPython 3.5m

File details

Details for the file fugashi-0.2.1.tar.gz.

File metadata

  • Download URL: fugashi-0.2.1.tar.gz
  • Upload date:
  • Size: 333.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1.tar.gz
Algorithm Hash digest
SHA256 de44613699b7c998091625a214c6cc5c2aa526ead3bcdc5a7641a19d3b451b98
MD5 26cef2c8b2d00e8ae738b40d2080d788
BLAKE2b-256 5ebc28f4500c5a4056e0ee14d05baf23d2c7d74738c87917dcbd32a4a0bfcc2b

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 500.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 90f54cc5ddb175e97f2638a120e75f96164e778976b3ecddd483bcb96d80ab07
MD5 7c9d6ecd9e62d26226842e77edc0e1c5
BLAKE2b-256 521d189fe7de521a9d1bf0838b1726bbde677e6ab848c064db55fbe0c622a5e0

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 474.3 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 93a16f9dce373e18f897bbcf82061050f71398a11cdeaa6a64b6388862c16a76
MD5 b234ca45249d523d330cead04bbfe13a
BLAKE2b-256 8739392545a7f4558b6f8e5f33c57800264131b5c9d964248c421cd363a3a020

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 498.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 702a3b063fcde97c06ca3392e403a951bc04ef545d0a68d255224efcdb3689e3
MD5 1aa1e80192b5d04c996181028437aeb7
BLAKE2b-256 0d7f118fd58eefff8fcb7219357dc8d35b6e8c6d0d0f4dd332bd0a18b24273ba

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 466.4 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 73f3b8235878165460f4460ce344ad00c372f4e80afe6485e1bd12895b887a9d
MD5 a78ce9b936e3ae780a1a98fbb6279289
BLAKE2b-256 90ef7c0f2bee9bf7313f42afc8b9f783e9f0c0b2d9d1fd44d4ec3486bc7feaf3

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 498.9 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 03061254ce816e1dc9c23d1f647f7ae929cba508c863a6eb721d96a8b0bae6a9
MD5 88ff17aedcf09f7d0369bd8a8dc7a248
BLAKE2b-256 e8bd25e42ce2341dfb038235b2c587203d753a333905a49ae6a28aaf5b772f66

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 467.2 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 60f217d04c78477d32dfd5c31055e282adb64886a374c8afce05056b05ae6b69
MD5 ef60b0e6da3510bcd07af332199976ea
BLAKE2b-256 0e52451e3cc522a9ca18351fc6f2ff09fb1a894f7e1080aef8d4781d64c052a9

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 497.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 65d0210769dc0c29abcf5ddd739bcb5bb76d340969741dd5d152fc5de586a9f7
MD5 8738f9bac750b7ec809a1eb2cadf73f0
BLAKE2b-256 1e43ce8829171a0dd280af7fff7f018df3b7f645812f6e071622b8a01729360c

See more details on using hashes here.

File details

Details for the file fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 462.3 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 99c2a341b1b346462b3e56e1dd18de76f45fb3e2cd446df3fce41971ea0dbe08
MD5 c1850a515e2d8785b3ca216166dd45fd
BLAKE2b-256 b14a3495b8268052338f9f08b91fd92b615619b5699222a52ddb417875b14bc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page