Yet another sentence-level tokenizer for the Japanese text
Project description
sengiri
sengiri is yet another sentence-level tokenizer for the Japanese text
DEPENDENCY
MeCab
INSTALLATION
$ pip install sengiri
USAGE
import sengiri
print(sengiri.tokenize('これは!(すばらしい!)感動……。'))
#=>['これは!', '(すばらしい!)', '感動……。']
print(sengiri.tokenize('うーん🤔🤔🤔どうしよう'))
#=>['うーん🤔🤔🤔', 'どうしよう']
print(sengiri.tokenize('モー娘。のコンサートに行った。'))
#=>['モー娘。のコンサートに行った。']
CHANGES
0.1.1 (2019-10-05)
First release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sengiri-0.1.1.tar.gz
(3.6 kB
view hashes)