Super compact Japanese tokenizer
Project description
TinySegmenter
----------
TinySegmenter -- Super compact Japanese tokenizer was originally created by
(c) 2008 Taku Kudo for javascript under the terms of a new BSD licence.
For details, see [here](http://lilyx.net/pages/tinysegmenter_licence.txt)
tinysegmenter for python2.x was written by Masato Hagiwara.
for his information see [here](http://lilyx.net/pages/tinysegmenterp.html)
This tinysegmenter is modified for python3.x and python2.x for distribution by Tatsuro Yasukawa.
Additionaly, this tinysegmenter is modified for being more faster - thanks to
@chezou, @cocoatomo and @methane.
See info about [tinysegmenter](https://github.com/SamuraiT/tinysegmenter)
Installation
------------
```
pip install tinysegmenter3
```
Usage
----------
```py
import tinysegmenter
statement = '私はpython大好きStanding Engineerです.'
tokenized_statement = tinysegmenter.tokenize(statement)
print(tokenized_statement)
# ['私', 'は', 'python', '大好き', 'Standing', ' Engineer', 'です', '.']
```
Test Text
----------
The [test text](http://www.genpaku.org/timemachine/timemachineu8j.txt) (in the `tests` directory) was [The Time Machine](https://en.wikipedia.org/wiki/The_Time_Machine) by H.G. Wells, translated to Japanese by Hiroo Yamagata under the CC BY-SA 2.0 License.
How to run Test
-----------
Install requirements from `requirements.txt` by
```py
pip install -r requirements.txt
```
then run this:
```py
./runtests.sh
```
----------
TinySegmenter -- Super compact Japanese tokenizer was originally created by
(c) 2008 Taku Kudo for javascript under the terms of a new BSD licence.
For details, see [here](http://lilyx.net/pages/tinysegmenter_licence.txt)
tinysegmenter for python2.x was written by Masato Hagiwara.
for his information see [here](http://lilyx.net/pages/tinysegmenterp.html)
This tinysegmenter is modified for python3.x and python2.x for distribution by Tatsuro Yasukawa.
Additionaly, this tinysegmenter is modified for being more faster - thanks to
@chezou, @cocoatomo and @methane.
See info about [tinysegmenter](https://github.com/SamuraiT/tinysegmenter)
Installation
------------
```
pip install tinysegmenter3
```
Usage
----------
```py
import tinysegmenter
statement = '私はpython大好きStanding Engineerです.'
tokenized_statement = tinysegmenter.tokenize(statement)
print(tokenized_statement)
# ['私', 'は', 'python', '大好き', 'Standing', ' Engineer', 'です', '.']
```
Test Text
----------
The [test text](http://www.genpaku.org/timemachine/timemachineu8j.txt) (in the `tests` directory) was [The Time Machine](https://en.wikipedia.org/wiki/The_Time_Machine) by H.G. Wells, translated to Japanese by Hiroo Yamagata under the CC BY-SA 2.0 License.
How to run Test
-----------
Install requirements from `requirements.txt` by
```py
pip install -r requirements.txt
```
then run this:
```py
./runtests.sh
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tinysegmenter3-0.1.0.tar.gz
(11.2 kB
view details)
File details
Details for the file tinysegmenter3-0.1.0.tar.gz
.
File metadata
- Download URL: tinysegmenter3-0.1.0.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 704703302fbf95d270791506d0d37d3088e09f58c822aff69497d5415bce6e62 |
|
MD5 | ce8f4eac7b2ca498c4655b88e4f0efa9 |
|
BLAKE2b-256 | fa02fcfeebe21e1e030da593f2151538c273e1eeccd8fb62d18811dbffc5cd6d |