Python wrapper of Vaporetto tokenizer
Project description
🐍 python-vaporetto 🛥
Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
Installation
To use Vaporetto, run the following command:
$ pip install vaporetto
Or you can also build from the source:
$ python -m venv .env
$ source .env/bin/activate
$ pip install maturin
$ maturin develop -r
Example Usage
python-vaporetto does not contain model files. To perform tokenization, follow the document of Vaporetto to download distribution models or train your own models before hand.
# Import vaporetto module
import vaporetto
# Load the model file
with open('path/to/model.zst', 'rb') as fp:
model = fp.read()
# Create an instance of the Vaporetto
tokenizer = vaporetto.Vaporetto(model, predict_tags = True)
# Tokenize
tokenizer.tokenize_to_string('まぁ社長は火星猫だ')
#=> 'まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ'
tokens = tokenizer.tokenize('まぁ社長は火星猫だ')
len(tokens)
#=> 6
tokens[0].surface()
#=> 'まぁ'
tokens[0].tag(0)
#=> '名詞'
tokens[0].tag(1)
#=> 'マー'
[token.surface() for token in tokens]
#=> ['まぁ', '社長', 'は', '火星', '猫', 'だ']]
You can also use KyTea's models as follows:
with open('path/to/jp-0.4.7-5.mod', 'rb') as fp:
model = fp.read()
tokenizer = vaporetto.Vaporetto.create_from_kytea_model(model)
Note: Vaporetto does not support tag prediction with KyTea's models.
Documentation
Use the help function to show the API reference.
import vaporetto
help(vaporetto)
Speed Comparison
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Disclaimer
This software is developed by LegalForce, Inc., but not an officially supported LegalForce product.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for vaporetto-0.1.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 838c1e59f548a5de86e5388ab9452bc881fff53e4ce888de45c20994625a0e8f |
|
MD5 | 8bb124480ec95d0b2f0b9ec9dcca8371 |
|
BLAKE2b-256 | 7957b01d3426bf1ed07951e35f683e988c1cd48cd7bf5253a294ca6396395851 |
Hashes for vaporetto-0.1.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5564538abc7c729a0a9cc3b6e8afc3c3575a1c62871216593008a0d75251cd3e |
|
MD5 | 4d90f8c3a53c939c3cba8a9646bf4b76 |
|
BLAKE2b-256 | 08530f90607a89b02824eb011f64b26291a9aa57ce6954bc31a7fb18ed2abcfa |
Hashes for vaporetto-0.1.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfbb920f54445a6851f4b314981b0aac797a7f355b3cf2a48bd27b5d707b607c |
|
MD5 | fecdef942893353ee15ce6c5df0b5228 |
|
BLAKE2b-256 | e112ad5b3a7d60d481f692de9dad55818090bb3788b9f7184ac519b72353ff8d |
Hashes for vaporetto-0.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9cd537ac0a6b561e74c11925f3013c12759baa0131a3c931054b623899ef706 |
|
MD5 | 7928f2f8ad6755620d8ac1da30d6f131 |
|
BLAKE2b-256 | b1bb7e50e19b4d529a2da8abbca46f0debe1e41fe926ae77f5c1e7f4bb2e4a24 |
Hashes for vaporetto-0.1.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41c33924ff8a5c43a7a64bcf5b5b138411cc0cc40a7276836334dab13fe2759b |
|
MD5 | 9e31c1471f88f79cb8974157754dd148 |
|
BLAKE2b-256 | e0cde8bd75f917f53f08f74b0986c10f3cb4cba21efa8543f83f2e4751e733f1 |
Hashes for vaporetto-0.1.0-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5d3618f86a4a7ead9d4736ca5d2f280b8d95aa71a2844afe4895f4f47141cb9 |
|
MD5 | 78b62829e1d542ea12d3bb2548614a05 |
|
BLAKE2b-256 | 9c150e40aa44cc00f183d21902fc57728c27f6b2eb059be9475e3085ee5a68ad |
Hashes for vaporetto-0.1.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22b4d4f0b1df90557c5c1cf2b94a09d41a544d7bf32e9c1495634a4760d8b1d9 |
|
MD5 | f1a6a46e000ff00a151c30288f03cea8 |
|
BLAKE2b-256 | 7717622eb4785aceb624ad14f2981f2533170c1d080f36b4e960496e3eddafa8 |
Hashes for vaporetto-0.1.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb28ac6d289b0b65f7fc1d2cf5d83d867482f17bc2777f14920ee835d101a91a |
|
MD5 | a3069201909279cbf19ef9f19b1c7eef |
|
BLAKE2b-256 | 9cc75911bbfaab3f6801931ff9986934710fe3368436dc071da7039b91f3b501 |
Hashes for vaporetto-0.1.0-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e9a36213e450c999bda10a0c56b54ceecc6a74bfddfd06e10d15d5dbbace382 |
|
MD5 | c05528d1b8cfab43eb8aa0f08e9b8322 |
|
BLAKE2b-256 | 8348893f9283a1fb1c808ccb2668f7a00ab39ccbd0b395df694c7ff2ccb31649 |
Hashes for vaporetto-0.1.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eefb2a4d40a70e405526b3e2cfcdcfd6a988b82b60425db35f1ecc36ad24aa57 |
|
MD5 | 4db222058ddd02fdf042d45a612d9de8 |
|
BLAKE2b-256 | cef15703bf75640057f69f91b31ab23ba4d21da6bf6f67326954530c26337656 |
Hashes for vaporetto-0.1.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ba58c754699a15aec6f185ccc06531f41849a8a6bb6c8e0f404d374829be860 |
|
MD5 | 8b478c7652c73995fd79558703a4bc5a |
|
BLAKE2b-256 | 83cb9e69ba079fdb9ab411273fcf8b20cf1989eafbbbdaf20be2982873222a0f |
Hashes for vaporetto-0.1.0-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3651c247f0c225deaf1e95b3e4718af8ab9ec491cd334faa1d40899ce0d99d0 |
|
MD5 | 3da4b0062d052651ab8871b42e2eab34 |
|
BLAKE2b-256 | 8a7a567f83c725bd59c767c730729009f128c017d5dba4f93047c6e9b554c909 |
Hashes for vaporetto-0.1.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82824b797cd7e3b6d8782bc0120a59e5c93803d7ba10dab13f7770c06bcb583e |
|
MD5 | e81e90449405ddcae74bc7e896c3d105 |
|
BLAKE2b-256 | 081520c14b0e589247e514dea08ceb76307d2ee275506c10eefc086cd29ffddb |
Hashes for vaporetto-0.1.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 566458bf1c25c219c83ec8d3952a687eac13b544b137846e1965ae53022a4a04 |
|
MD5 | eab5178b9de6a8da28419daf8e5448d9 |
|
BLAKE2b-256 | 43189fc842d843ac0addbf8b2c02807ae7c8dd36e94bad5eb14a94ad5ba8cdd3 |