ho-cho is japanese language processing package.
Project description
Ho-Cho
Install
Mac
# Install MeCab
brew install mecab mecab-ipadic
# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
&& cd mecab-ipadic-neologd \
&& bin/install-mecab-ipadic-neologd -n -a -y
&& cd ..
pip install ho-cho
Ubuntu
# Install MeCab and mecab-ipadic-neologd
apt-get update && apt-get install -y mecab libmecab-dev mecab-ipadic mecab-ipadic-utf8
# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
&& cd mecab-ipadic-neologd \
&& bin/install-mecab-ipadic-neologd -n -a -y
&& cd ..
pip install ho-cho
Window
coming soon ...
Usage
1. cleaning
テキストのクリーニング
2. tokenizing
単語の分割
hocho/tokenizer/impl/mecab_tokenizer.py
3. normalization
単語の正規化
- 文字種の統一
- 数字の置き換え
- 辞書を用いた単語の統一
4. stopwords
ストップワードの除去
- 辞書による方法
- 出現頻度による方法
- 有名なストップワードを用いた除去方法
Development
How to develop
git pull origin main
git checkout -b feature/xxxx
git add .
git commit -m "xxx"
git push origin feature/xxx
Run test
pytest -v tests
Set up
pip install -e .
Publish to TestPyPI
# Install dependencies
pip install setuptools wheel twine
# Build
python setup.py sdist bdist_wheel
# Publish to TestPyPI
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Publish to PyPI
# Install dependencies
pip install setuptools wheel twine
# Build
python setup.py sdist bdist_wheel
# Publish to PyPI
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ho-cho-0.0.3.tar.gz
(4.8 kB
view hashes)
Built Distribution
ho_cho-0.0.3-py3-none-any.whl
(6.2 kB
view hashes)