Skip to main content

Divide documents by character type

Project description

概要

文字列をひらがな、片仮名、漢字、数字、アルファベットごとに分割するツールです。 英日両文に適用可能ですが、ピリオドを含む一部の用語は適切に分割できない場合があります。 詳しくは、実行サンプルをご確認ください。

セットアップ

pip install divide-char-type

使い方

from divide_char_type import divide_char_type

data = divide_char_type("今日の天気は晴れです。")

print(data[0])

実行サンプル

['1.0', ' ', 'is', ' ', 'number', '.']
['1,000', ' ', 'is', ' ', 'number', '.']
['u.s.a.', ' ', 'is', ' ', 'state', '.']
['u.k', '.', ' ', 'is', ' ', 'state', '.']
['e.g.', ',', ' ', 'th', ',', ' ', 'ch', ',', ' ', 'sh', ',', ' ', 'ph', ',', ' ', 'gh', ',', ' ', 'ng', ',', ' ', 'qu']
['state', ' ', 'include', ' ', 'u.s.', ' ', 'u.s.', ' ', 'is', ' ', 'state', '.']
['state', ' ', 'include', ' ', 'u.k', '.', ' ', 'u.k', '.', ' ', 'is', ' ', 'state', '.']
['u.s.', 'は', '国', 'です', '。']
['u.s', '.', 'は', '国', 'です', '。']
['あいうえおーかきくけこ']
['アイウエオーカキクケコ']
['今日', 'の', '天気', 'は', '晴', 'れです', '。', '\n', '明日', 'の', '天気', 'は', '曇', 'りです', '。', '\n']
['&&&', '1.0', '&&&']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

divide-char-type-0.2.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

divide_char_type-0.2.0-py2-none-any.whl (4.1 kB view details)

Uploaded Python 2

File details

Details for the file divide-char-type-0.2.0.tar.gz.

File metadata

  • Download URL: divide-char-type-0.2.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.9

File hashes

Hashes for divide-char-type-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a1844274d9cd4c6c5e0ac9abc1eaa35d54b3748ccb9c4712df622f5cab20ffb2
MD5 2a05a70eff1af23cea0fadd85766eb01
BLAKE2b-256 d010304d6369ebc0f1a5930d6326f1cb0334a33dd7edd8b6bd8219c50507fd74

See more details on using hashes here.

File details

Details for the file divide_char_type-0.2.0-py2-none-any.whl.

File metadata

File hashes

Hashes for divide_char_type-0.2.0-py2-none-any.whl
Algorithm Hash digest
SHA256 a899c20a821101c17ff6e102b6c21c5d45d0f2cbea45990c7e58bab6d25bef00
MD5 e3477a0c6f08192a630fe3a00e90443e
BLAKE2b-256 7c5f1118efe6baee87281c4780f915ab87563482e3bde643ec572a0d17002c37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page