Skip to main content

ho-cho is japanese language processing package.

Project description

Ho-Cho

Install

Mac
# Install MeCab
brew install mecab mecab-ipadic

# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
    && cd mecab-ipadic-neologd \
    && bin/install-mecab-ipadic-neologd -n -a -y
    && cd ..

pip install ho-cho
Ubuntu
# Install MeCab and mecab-ipadic-neologd
apt-get update && apt-get install -y mecab libmecab-dev mecab-ipadic mecab-ipadic-utf8

# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
    && cd mecab-ipadic-neologd \
    && bin/install-mecab-ipadic-neologd -n -a -y
    && cd ..

pip install ho-cho
Window

coming soon ...

Usage

1. cleaning

テキストのクリーニング

hocho/cleaning.py

2. tokenizing

単語の分割

hocho/tokenizer/impl/mecab_tokenizer.py

3. normalization

単語の正規化

  • 文字種の統一
  • 数字の置き換え
  • 辞書を用いた単語の統一

hocho/normalization.py

4. stopwords

ストップワードの除去

  • 辞書による方法
  • 出現頻度による方法
  • 有名なストップワードを用いた除去方法

hocho/stopwords.py

Development

How to develop
git pull origin main

git checkout -b feature/xxxx

git add .
git commit -m "xxx"

git push origin feature/xxx
Run test
pytest -v tests
Set up
pip install -e .
Publish to TestPyPI
# Install dependencies
pip install setuptools wheel twine

# Build
python setup.py sdist bdist_wheel

# Publish to TestPyPI
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Publish to PyPI
# Install dependencies
pip install setuptools wheel twine

# Build
python setup.py sdist bdist_wheel

# Publish to PyPI
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ho-cho-0.0.3.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

ho_cho-0.0.3-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file ho-cho-0.0.3.tar.gz.

File metadata

  • Download URL: ho-cho-0.0.3.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for ho-cho-0.0.3.tar.gz
Algorithm Hash digest
SHA256 270392ff2867e2fec458888e6297fac2e6e7e0e1c8b7b54d169f29cb08561a01
MD5 551f47aef2978e1f3f81e7e9fc6914a7
BLAKE2b-256 e954494fb82b4ca03d7ff858f700c81599f9b8cbab13cbfc4dc077916a1a5e64

See more details on using hashes here.

File details

Details for the file ho_cho-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ho_cho-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for ho_cho-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 07625a46ad208052a675697e3031593d22f2cb367bc56a167c5832d534ed354e
MD5 fae719050ce9c9bebdc02f4fd29aa237
BLAKE2b-256 f6c525e4054480a5bb73077d9cc3b390720a4c26d73564c61b9c0cbe63fc14cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page