A Cython wrapper for MeCab
Project description
fugashi
Fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install (see docs below).
See the blog post for background on why Fugashi exists and some of the design decisions.
If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.
Usage
from fugashi import Tagger
tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
print(word, word.feature.lemma, word.pos, sep='\t')
# "feature" is the Unidic feature data as a named tuple
Installing a Dictionary
Fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.
- unidic-lite, a 2013 version of Unidic that's relatively small
- unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step
If you just want to make sure things work you can start with unidic-lite, but
for more serious processing unidic is recommended. For production use you'll
generally want to generate your own dictionary too; for details see the MeCab
documentation.
To get either of these dictionaries, you can install them directly using pip
or do the below:
pip install fugashi[unidic-lite]
# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download
Dictionary Use
Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.
If you're using a dictionary besides Unidic you can use the GenericTagger like this:
from fugashi import GenericTagger
tagger = GenericTagger()
# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
print(word.surface, word.feature[0])
You can also create a dictionary wrapper to get feature information as a named tuple.
from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
print(word.surface, word.feature.alpha)
Alternatives
If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fugashi-0.2.1.tar.gz.
File metadata
- Download URL: fugashi-0.2.1.tar.gz
- Upload date:
- Size: 333.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de44613699b7c998091625a214c6cc5c2aa526ead3bcdc5a7641a19d3b451b98
|
|
| MD5 |
26cef2c8b2d00e8ae738b40d2080d788
|
|
| BLAKE2b-256 |
5ebc28f4500c5a4056e0ee14d05baf23d2c7d74738c87917dcbd32a4a0bfcc2b
|
File details
Details for the file fugashi-0.2.1-cp38-cp38-win_amd64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 500.1 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90f54cc5ddb175e97f2638a120e75f96164e778976b3ecddd483bcb96d80ab07
|
|
| MD5 |
7c9d6ecd9e62d26226842e77edc0e1c5
|
|
| BLAKE2b-256 |
521d189fe7de521a9d1bf0838b1726bbde677e6ab848c064db55fbe0c622a5e0
|
File details
Details for the file fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp38-cp38-manylinux1_x86_64.whl
- Upload date:
- Size: 474.3 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93a16f9dce373e18f897bbcf82061050f71398a11cdeaa6a64b6388862c16a76
|
|
| MD5 |
b234ca45249d523d330cead04bbfe13a
|
|
| BLAKE2b-256 |
8739392545a7f4558b6f8e5f33c57800264131b5c9d964248c421cd363a3a020
|
File details
Details for the file fugashi-0.2.1-cp37-cp37m-win_amd64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 498.9 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
702a3b063fcde97c06ca3392e403a951bc04ef545d0a68d255224efcdb3689e3
|
|
| MD5 |
1aa1e80192b5d04c996181028437aeb7
|
|
| BLAKE2b-256 |
0d7f118fd58eefff8fcb7219357dc8d35b6e8c6d0d0f4dd332bd0a18b24273ba
|
File details
Details for the file fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp37-cp37m-manylinux1_x86_64.whl
- Upload date:
- Size: 466.4 kB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73f3b8235878165460f4460ce344ad00c372f4e80afe6485e1bd12895b887a9d
|
|
| MD5 |
a78ce9b936e3ae780a1a98fbb6279289
|
|
| BLAKE2b-256 |
90ef7c0f2bee9bf7313f42afc8b9f783e9f0c0b2d9d1fd44d4ec3486bc7feaf3
|
File details
Details for the file fugashi-0.2.1-cp36-cp36m-win_amd64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 498.9 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03061254ce816e1dc9c23d1f647f7ae929cba508c863a6eb721d96a8b0bae6a9
|
|
| MD5 |
88ff17aedcf09f7d0369bd8a8dc7a248
|
|
| BLAKE2b-256 |
e8bd25e42ce2341dfb038235b2c587203d753a333905a49ae6a28aaf5b772f66
|
File details
Details for the file fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 467.2 kB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60f217d04c78477d32dfd5c31055e282adb64886a374c8afce05056b05ae6b69
|
|
| MD5 |
ef60b0e6da3510bcd07af332199976ea
|
|
| BLAKE2b-256 |
0e52451e3cc522a9ca18351fc6f2ff09fb1a894f7e1080aef8d4781d64c052a9
|
File details
Details for the file fugashi-0.2.1-cp35-cp35m-win_amd64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp35-cp35m-win_amd64.whl
- Upload date:
- Size: 497.9 kB
- Tags: CPython 3.5m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65d0210769dc0c29abcf5ddd739bcb5bb76d340969741dd5d152fc5de586a9f7
|
|
| MD5 |
8738f9bac750b7ec809a1eb2cadf73f0
|
|
| BLAKE2b-256 |
1e43ce8829171a0dd280af7fff7f018df3b7f645812f6e071622b8a01729360c
|
File details
Details for the file fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl.
File metadata
- Download URL: fugashi-0.2.1-cp35-cp35m-manylinux1_x86_64.whl
- Upload date:
- Size: 462.3 kB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99c2a341b1b346462b3e56e1dd18de76f45fb3e2cd446df3fce41971ea0dbe08
|
|
| MD5 |
c1850a515e2d8785b3ca216166dd45fd
|
|
| BLAKE2b-256 |
b14a3495b8268052338f9f08b91fd92b615619b5699222a52ddb417875b14bc0
|