Skip to main content

janomeライクなインターフェイスを提供するmecabのラッパーです.

Project description

wakame

janomeライクなインターフェイスを提供するmecabのラッパーです.

利用方法

import MeCab
from wakame.tokenizer import Tokenizer
from wakame.analyzer import Analyzer
from wakame.charfilter import *
from wakame.tokenfilter import *

text = '和布ちゃんこんにちは'

# 基本的な使い方
tokenizer = Tokenizer()
tokens = tokenizer.tokenize(text)
for token in tokens:
    print(token)

# 分かち書き
tokens = tokenizer.tokenize(text, wakati=True)
print(tokens)

# 辞書をNEologdにする場合
tokenizer = Tokenizer(use_neologd=True)
tokens = tokenizer.tokenize(text)
for token in tokens:
    print(token)

# filterを利用する場合
char_filters = [RegexReplaceCharFilter('和布', 'wakame')]
token_filters = [POSKeepFilter('名詞'), POSStopFilter(['名詞,接尾'])]
analyzer = Analyzer(tokenizer, char_filters=char_filters, token_filters=token_filters)
tokens = analyzer.analyze(text)
for token in tokens:
    print(token)

# tokenの情報をDataFrameで用いる場合
tokenizer = Tokenizer()
analyzer = Analyzer(tokenizer)
df = analyzer.analyze_with_dataframe(text)
print(df)

インストール

MeCabのインストール(必須)

brew install mecab
brew install mecab-ipadic

mecab-ipadic-NEologdのインストール(任意)

brew install git curl xz
git clone --depth 1 git@github.com:neologd/mecab-ipadic-neologd.git
cd mecab-ipadic-neologd
./bin/install-mecab-ipadic-neologd -n

詳しくはこちらを参照してください

mecab-python3のインストール(必須)

brew install swig
pip install mecab-python3

wakameのインストール(必須)

pip install wakame

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wakame, version 0.2.1
Filename, size & hash File type Python version Upload date
wakame-0.2.1-py3-none-any.whl (7.0 kB) View hashes Wheel py3
wakame-0.2.1.tar.gz (4.8 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page