Skip to main content

ho-cho is japanese language processing package.

Project description

Ho-Cho

Install

Mac
# Install MeCab
brew install mecab mecab-ipadic

# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
    && cd mecab-ipadic-neologd \
    && bin/install-mecab-ipadic-neologd -n -a -y
    && cd ..

pip install ho-cho
Ubuntu
# Install MeCab and mecab-ipadic-neologd
apt-get update && apt-get install -y mecab libmecab-dev mecab-ipadic mecab-ipadic-utf8

# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
    && cd mecab-ipadic-neologd \
    && bin/install-mecab-ipadic-neologd -n -a -y
    && cd ..

pip install ho-cho
Window

coming soon ...

Usage

1. cleaning

テキストのクリーニング

hocho/cleaning.py

2. tokenizing

単語の分割

hocho/tokenizer/impl/mecab_tokenizer.py

3. normalization

単語の正規化

  • 文字種の統一
  • 数字の置き換え
  • 辞書を用いた単語の統一

hocho/normalization.py

4. stopwords

ストップワードの除去

  • 辞書による方法
  • 出現頻度による方法
  • 有名なストップワードを用いた除去方法

hocho/stopwords.py

Development

How to develop
git pull origin main

git checkout -b feature/xxxx

git add .
git commit -m "xxx"

git push origin feature/xxx
Run test
pytest -v tests
Set up
pip install -e .
Publish to TestPyPI
# Install dependencies
pip install setuptools wheel twine

# Build
python setup.py sdist bdist_wheel

# Publish to TestPyPI
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Publish to PyPI
# Install dependencies
pip install setuptools wheel twine

# Build
python setup.py sdist bdist_wheel

# Publish to PyPI
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ho-cho-0.0.3.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

ho_cho-0.0.3-py3-none-any.whl (6.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page