ho-cho is japanese language processing package.
Project description
Ho-Cho
Install
Mac
# Install MeCab
brew install mecab mecab-ipadic
# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
&& cd mecab-ipadic-neologd \
&& bin/install-mecab-ipadic-neologd -n -a -y
&& cd ..
pip install ho-cho
Ubuntu
# Install MeCab and mecab-ipadic-neologd
apt-get update && apt-get install -y mecab libmecab-dev mecab-ipadic mecab-ipadic-utf8
# Install mecab-ipadic-neologd
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git \
&& cd mecab-ipadic-neologd \
&& bin/install-mecab-ipadic-neologd -n -a -y
&& cd ..
pip install ho-cho
Window
coming soon ...
Usage
1. cleaning
テキストのクリーニング
2. tokenizing
単語の分割
hocho/tokenizer/impl/mecab_tokenizer.py
3. normalization
単語の正規化
- 文字種の統一
- 数字の置き換え
- 辞書を用いた単語の統一
4. stopwords
ストップワードの除去
- 辞書による方法
- 出現頻度による方法
- 有名なストップワードを用いた除去方法
Development
How to develop
git pull origin main
git checkout -b feature/xxxx
git add .
git commit -m "xxx"
git push origin feature/xxx
Run test
pytest -v tests
Set up
pip install -e .
Publish to TestPyPI
# Install dependencies
pip install setuptools wheel twine
# Build
python setup.py sdist bdist_wheel
# Publish to TestPyPI
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Publish to PyPI
# Install dependencies
pip install setuptools wheel twine
# Build
python setup.py sdist bdist_wheel
# Publish to PyPI
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ho-cho-0.0.3.tar.gz
(4.8 kB
view details)
Built Distribution
File details
Details for the file ho-cho-0.0.3.tar.gz
.
File metadata
- Download URL: ho-cho-0.0.3.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 270392ff2867e2fec458888e6297fac2e6e7e0e1c8b7b54d169f29cb08561a01 |
|
MD5 | 551f47aef2978e1f3f81e7e9fc6914a7 |
|
BLAKE2b-256 | e954494fb82b4ca03d7ff858f700c81599f9b8cbab13cbfc4dc077916a1a5e64 |
File details
Details for the file ho_cho-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: ho_cho-0.0.3-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07625a46ad208052a675697e3031593d22f2cb367bc56a167c5832d534ed354e |
|
MD5 | fae719050ce9c9bebdc02f4fd29aa237 |
|
BLAKE2b-256 | f6c525e4054480a5bb73077d9cc3b390720a4c26d73564c61b9c0cbe63fc14cb |