@uec_tlに使用するマルコフ連鎖による文章生成とテキスト処理のためのパッケージ

These details have not been verified by PyPI

Project links

Homepage

Project description

UECTL

@uec_tlに使用するマルコフ連鎖モデルとテキスト処理のためのパッケージです.

インストール

uectl本体:

$ pip install uectl

uectl本体+前処理用(MeCabが必要):

$ pip install uectl[preprocessing]

Dockerによる環境構築

マルコフ連鎖で文章を学習させる際, 文章を単語ごとに分ける必要があります. この単語ごとに区切る処理はMeCabというソフトウェアを使います. MeCabを各OSに応じて導入するのは少し面倒なので, Dockerで環境構築できるようにしました. ちなみにMeCabが必要なのは前処理の項だけです. uectl本体だけでも, 前処理済みのサンプルファイルsample_output.txtを使って, モデルの学習と文章生成を行うことができます.

セットアップ(初回時)とコンテナの立ち上げ

$ docker-compose up -d

uec_tl_markovという名前のイメージとuec_tl_markovという名前のコンテナが作成されます.

コンテナに移動

$ docker-compose exec app /bin/sh -c "[ -e /bin/bash ] && /bin/bash || /bin/sh"
root@コンテナID:/home/uec_tl_markov#

前処理

workspaceディレクトリに移動した後, preprocessing.pyを使って, テキストを前処理します.

$ cd workspace

sample_input.txtというサンプルファイルがあるので, それを使って試してみます.

$ cat sample_input.txt
私は電通大が好きです
調布が好きでした
好きな店は食神です

$ python preprocessing.py -i sample_input.txt -o sample_output.txt

前処理の結果はsample_output.txtに保存しました.

$ cat sample_output.txt
私 は 電通大 が 好き です 
調布 が 好き でし た 
好き な 店 は 食 神 です

食神以外は予想通りに区切れていますね.

モデルの学習

次にsample_output.txtの各行を学習データとして, N階マルコフ連鎖(N=2)のモデルを作成します.

$ python training_model.py -i sample_output.txt -o sample_model.json -s 2

学習したモデルはsample_model.jsonとして保存しました.

文章生成

最後に, 先ほど学習したモデルsample_model.jsonを使って, どのような文章を生成するかを確かめてみます.

$ python testing_model.py -i sample_model.json -c 5
調布が好きでした
好きな店は食神です
調布が好きでした
好きな店は食神です
私は電通大が好きでした

私は電通大が好きでしたという学習データには存在しない文章の生成を確認できました！

文章を生成し始める単語の指定もできます. 指定する単語数は1からNのいずれかです(今回の例ではN=2). それぞれの単語は空白文字で区切る必要があります.

$ python testing_model.py -i sample_model.json -c 5 -b "電通大 が"
電通大が好きでした
電通大が好きでした
電通大が好きです
電通大が好きでした
電通大が好きです

UEC18LT会登壇資料

電通大生の呟きを基に電通大生を錬成してみた

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Oct 16, 2020

0.2.0

Sep 23, 2020

0.1.2

Sep 15, 2020

0.1.1

Sep 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uectl-0.2.1.tar.gz (7.2 kB view details)

Uploaded Oct 16, 2020 Source

Built Distribution

uectl-0.2.1-py3-none-any.whl (8.4 kB view details)

Uploaded Oct 16, 2020 Python 3

File details

Details for the file uectl-0.2.1.tar.gz.

File metadata

Download URL: uectl-0.2.1.tar.gz
Upload date: Oct 16, 2020
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.12

File hashes

Hashes for uectl-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ac6787b4eb0424f214ce6d132f9b8dd81748c706076e704658949c6f1104ab52`
MD5	`1a2cf45078db57a47d0e790c91921658`
BLAKE2b-256	`4e3860855785a874a89e15201d197446f837ece79e20688ba4444af9c77b72ff`

See more details on using hashes here.

File details

Details for the file uectl-0.2.1-py3-none-any.whl.

File metadata

Download URL: uectl-0.2.1-py3-none-any.whl
Upload date: Oct 16, 2020
Size: 8.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.6.12

File hashes

Hashes for uectl-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a97f6b674b30934de7536058a39e6a72bb35c35a2216f53a1f537a6422bb2659`
MD5	`b7470982ce1b1fe5068c70b84e8b526d`
BLAKE2b-256	`06336183474bc935c1ddf1132249cea02956f9153e692e69fc5c75dabdbcc4cb`

See more details on using hashes here.

uectl 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UECTL

インストール

Dockerによる環境構築

セットアップ(初回時)とコンテナの立ち上げ

コンテナに移動

前処理

モデルの学習

文章生成

UEC18LT会登壇資料

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes