Divide documents by character type
Project description
概要
文字列をひらがな、片仮名、漢字、数字、アルファベットごとに分割するツールです。 英日両文に適用可能ですが、ピリオドを含む一部の用語は適切に分割できない場合があります。 詳しくは、実行サンプルをご確認ください。
セットアップ
pip install divide-char-type
使い方
from divide_char_type import divide_char_type
data = divide_char_type("今日の天気は晴れです。")
print(data[0])
実行サンプル
['1.0', ' ', 'is', ' ', 'number', '.']
['1,000', ' ', 'is', ' ', 'number', '.']
['u.s.a.', ' ', 'is', ' ', 'state', '.']
['u.k', '.', ' ', 'is', ' ', 'state', '.']
['e.g.', ',', ' ', 'th', ',', ' ', 'ch', ',', ' ', 'sh', ',', ' ', 'ph', ',', ' ', 'gh', ',', ' ', 'ng', ',', ' ', 'qu']
['state', ' ', 'include', ' ', 'u.s.', ' ', 'u.s.', ' ', 'is', ' ', 'state', '.']
['state', ' ', 'include', ' ', 'u.k', '.', ' ', 'u.k', '.', ' ', 'is', ' ', 'state', '.']
['u.s.', 'は', '国', 'です', '。']
['u.s', '.', 'は', '国', 'です', '。']
['あいうえおーかきくけこ']
['アイウエオーカキクケコ']
['今日', 'の', '天気', 'は', '晴', 'れです', '。', '\n', '明日', 'の', '天気', 'は', '曇', 'りです', '。', '\n']
['&&&', '1.0', '&&&']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file divide-char-type-0.2.0.tar.gz
.
File metadata
- Download URL: divide-char-type-0.2.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a1844274d9cd4c6c5e0ac9abc1eaa35d54b3748ccb9c4712df622f5cab20ffb2
|
|
MD5 |
2a05a70eff1af23cea0fadd85766eb01
|
|
BLAKE2b-256 |
d010304d6369ebc0f1a5930d6326f1cb0334a33dd7edd8b6bd8219c50507fd74
|
File details
Details for the file divide_char_type-0.2.0-py2-none-any.whl
.
File metadata
- Download URL: divide_char_type-0.2.0-py2-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a899c20a821101c17ff6e102b6c21c5d45d0f2cbea45990c7e58bab6d25bef00
|
|
MD5 |
e3477a0c6f08192a630fe3a00e90443e
|
|
BLAKE2b-256 |
7c5f1118efe6baee87281c4780f915ab87563482e3bde643ec572a0d17002c37
|