Python wrapper of Vaporetto tokenizer
Project description
🐍 python-vaporetto 🛥
Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
Installation
Install pre-built package from PyPI
Run the following command:
$ pip install vaporetto
Build from source
You need to install the Rust compiler following the documentation beforehand.
daachorse uses pyproject.toml
, so you also need to upgrade pip to version 19 or later.
$ pip install --upgrade pip
After setting up the environment, you can install daachorse as follows:
$ pip install git+https://github.com/daac-tools/python-vaporetto
Example Usage
python-vaporetto does not contain model files. To perform tokenization, follow the document of Vaporetto to download distribution models or train your own models beforehand.
# Import vaporetto module
import vaporetto
# Load the model file
with open('path/to/model.zst', 'rb') as fp:
model = fp.read()
# Create an instance of the Vaporetto
tokenizer = vaporetto.Vaporetto(model, predict_tags = True)
# Tokenize
tokenizer.tokenize_to_string('まぁ社長は火星猫だ')
#=> 'まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ'
tokens = tokenizer.tokenize('まぁ社長は火星猫だ')
len(tokens)
#=> 6
tokens[0].surface()
#=> 'まぁ'
tokens[0].tag(0)
#=> '名詞'
tokens[0].tag(1)
#=> 'マー'
[token.surface() for token in tokens]
#=> ['まぁ', '社長', 'は', '火星', '猫', 'だ']
You can also use KyTea's models as follows:
with open('path/to/jp-0.4.7-5.mod', 'rb') as fp:
model = fp.read()
tokenizer = vaporetto.Vaporetto.create_from_kytea_model(model)
Note: Vaporetto does not support tag prediction with KyTea's models.
Speed Comparison
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
See the guidelines.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for vaporetto-0.2.1-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dd752d8b9b650f21d1016895be0d0cb0f43ea580b99703d49af4c7aa642862e |
|
MD5 | b3e174f8891fdd895ab1fac7a9bb46cc |
|
BLAKE2b-256 | 53fbb7aae7c323e790918917d17ad120a18017468fe6d7913527a18a2b5941ee |
Hashes for vaporetto-0.2.1-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6301ef1a310729e70574cf539637ff8f140706d25f60895274ec0cebaa279229 |
|
MD5 | 8c9d5a31071875d52c6b207dd39df832 |
|
BLAKE2b-256 | 433bb58b8e03921455f69074ef0323aa5217cb3d97a07512b7a5dffddf8aad92 |
Hashes for vaporetto-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2aa8adce7b94ec982ab5bdb1c2f613c1a740c54f6744d95be1300473dcf64127 |
|
MD5 | 9f3ee761e255d5c5d15113bf72979d6c |
|
BLAKE2b-256 | 39c57a4b1c57e6f268e1b990375ec25e587bf7a3c9b90fe1d5314d9f3cd2c053 |
Hashes for vaporetto-0.2.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77a19df5ffa0336f9906347924d4849f9e99f43261c83e36de990bfdcd4dff2f |
|
MD5 | 7c00ec877624c0bb2f2dd9b8a5529742 |
|
BLAKE2b-256 | 20203f5194c3c3ae792730f023b20f5c9bf4dffa5a106e8fbb2847da8a169ac4 |
Hashes for vaporetto-0.2.1-cp311-cp311-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5858c9e0b9076fd9a8191aa70a50d41d33c0d42b8e7aafb98106bdb1206434f6 |
|
MD5 | fa1ae4621b99ab211dc6393139546bda |
|
BLAKE2b-256 | 70000d9e8886339731284f779fa5ba1c0c96a7fc4328150d3de80f17f51358fa |
Hashes for vaporetto-0.2.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7256ec87928840f670f48b9d149f208baa830bb8547b5734323c86c0e16005be |
|
MD5 | 7e378965414363c22666f940ab2f5054 |
|
BLAKE2b-256 | b68577ab35a5887c69fec48ca78c7d4ef2fb6129ea6d128fff8b8bb5953e2c55 |
Hashes for vaporetto-0.2.1-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17eb4c6c0f726b5245c2bd4f9a81431d536536b9d7cd452c5916fdb46da5dadf |
|
MD5 | 8b2c96d273eb7db99f5f26d22d488e30 |
|
BLAKE2b-256 | 7eed59e137d77af6482dbdef136dd0d4844d98b4a0390ba4f97b0464c84d9281 |
Hashes for vaporetto-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cbe916a4d057254601198ac11a6ee3a6efe0332bc5ddc25c13f726a6e97aed18 |
|
MD5 | 6a09bcd00c308a7ec3e6f08fa2e2afc0 |
|
BLAKE2b-256 | ab1fc552cb37ec5550a01a73545cbf6bbeb734ad27c0f48b807c1310013fd0fe |
Hashes for vaporetto-0.2.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81a03d4582023e1b05ecf4f3c80b0f5f5be7ea116322d34763c507e8f66fef55 |
|
MD5 | e6f5bc78254fa88546ff32061df006fa |
|
BLAKE2b-256 | fa13c97d416db8fde0a8e66684fd66b583d899f0d6eb2baf2f92bc6d91d9444c |
Hashes for vaporetto-0.2.1-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee4ebe76b6a653fa588b5e7e2b9bd2829862f71acc736ef2c32a596e230a695e |
|
MD5 | 1f7968c10abcf07ee5af433c319ba0b5 |
|
BLAKE2b-256 | b8fab10e9155019d31ae0c4feab0b9d2167402f38fe6f20df0b6b9dbac171971 |
Hashes for vaporetto-0.2.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc6a688f25601857dfd943a97041dfad3ae2b1cfb01b585d783ced7c4f6cf611 |
|
MD5 | f9a0b8725239480c6b42f1cf6cc7f746 |
|
BLAKE2b-256 | 1deb4d3742a5c7ce090d56c216e38b344da595eb91612c8c21ce38a84c24ab25 |
Hashes for vaporetto-0.2.1-cp39-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed0e7a83e4e5ce23e246dffbf913f22ec32687e49a24b8715225d87e4db89fbe |
|
MD5 | 949c4ee289cf59a44cd84b2673d8164b |
|
BLAKE2b-256 | d7e9c16255698758f85f044d1b6ba6d84ab6115a62f3494e89299bf54a024dff |
Hashes for vaporetto-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75598ffb56eca48c3a9f1090344f29be1859f258b23dbff801dd0580068039d1 |
|
MD5 | 3c8d32bc60b266424aac235419f12f5f |
|
BLAKE2b-256 | c21e343c051607b9c9a879c7e63e2726761dbad7d8567ede27f03bc38e064137 |
Hashes for vaporetto-0.2.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2833a1f887631808d7361dc52324ba34bb8c480e90ea076effdc5d9a1a7aee0b |
|
MD5 | bc4fdc2baec8ff1aed6ae2e9e0af71e0 |
|
BLAKE2b-256 | 4109b33ed6551f12a2257138eabdb802b78f4edf778896185517dbd61333443c |
Hashes for vaporetto-0.2.1-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d57b96df523a3b6dd2ddb3e8e4079c4b369d21e60a428708bf1415d123d465b |
|
MD5 | 38e0e9751fdb64039256af7ad30b835d |
|
BLAKE2b-256 | 724e17494cf0082a82c514a4693129012d5e158e7936fd979d7fd7d1727dbf85 |
Hashes for vaporetto-0.2.1-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53675b9f2515d1457a6fd955bf467b09269a5dcf30546c9861697f2f1db2b023 |
|
MD5 | 50cc2143b8e082f7ff84c69a2b5b0368 |
|
BLAKE2b-256 | 4dec583ec7d653a4e4ce7db86b8a2c48c3bb5d094e15b0113685ce6e0b4cb829 |
Hashes for vaporetto-0.2.1-cp38-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01400bdd9b91ea9ba7568222962a410fdaa0dfac8061bf9120d7521a6a649268 |
|
MD5 | 646f752088c235c3e9cd841c2b9eb498 |
|
BLAKE2b-256 | 38e3182d90482801d795f91e9c14765af4e566898f732a208531693350daa11e |
Hashes for vaporetto-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e522d1cd532eb434f79d484a3e608a90b9b366be3bc4c1cfdb0400aaee66ac7 |
|
MD5 | 69c7cc4e3f99233842323f77482af3e0 |
|
BLAKE2b-256 | c3b23059daccc960d0a74464c514630f035084740977b95305b7a5f50d020371 |
Hashes for vaporetto-0.2.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86d8ffc37bb07ba36aa3b57400afe27e0f69105bc02e0386b8d5cac698e8963b |
|
MD5 | eb094bf1451384983df8fa7720516ad5 |
|
BLAKE2b-256 | 552fdd707967d9a1023538123f2303a01f8355c472803e762b808a42f41ae88b |
Hashes for vaporetto-0.2.1-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3d615bc6d401f214a83bec80af6f4b284e386d00eaef7714496a6f7d53a0fe3 |
|
MD5 | 18c7915c2fe235ea4b2ac8e0203c5e6f |
|
BLAKE2b-256 | 23b45949cb2423ff4c2a531673eabbc51dcbadc25b54b2e474c452c7dc93bc14 |
Hashes for vaporetto-0.2.1-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e466c6d64a83c51ff94b5b08b4fe8524641158e9324ac9832f659065bae5b61 |
|
MD5 | ced1a01e835fb3710b00217abe58589f |
|
BLAKE2b-256 | fc19d6becc3aac338bffa25d85adfa10109adc55be37c7d73b9bf0d0704b7374 |
Hashes for vaporetto-0.2.1-cp37-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6d9cdc2e633dea418147b18b167f4af25bcadffe6f268c1d2ca12b0c73625be |
|
MD5 | 7bb9182193f86fdbfc7c61e8608607e9 |
|
BLAKE2b-256 | 8866193d55f2836b78f3995f7bd8f72fca2e3749ba9c68a610e1e40805708cd4 |
Hashes for vaporetto-0.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5a8b171cb4cf1403feb4917ea25aeb36e043a558365f5db79e7e6b70bc2267d |
|
MD5 | 237902172bc8b24e94727a375d19007f |
|
BLAKE2b-256 | dfc3962913fe09676da34b522d65027c47f8b82f1e95fa51b7d5d4bc88b418d2 |
Hashes for vaporetto-0.2.1-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c401ca43d9599f153bf990835337df3cbdc02ddb6a465981e720216847c58e1 |
|
MD5 | e7c7ce379078561bd15070b7512e54f7 |
|
BLAKE2b-256 | 27e28b333fc53bffa606e3ab5d875c4b439e1688afe16408922db1d954943e21 |
Hashes for vaporetto-0.2.1-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a969797e5f72f8603890df679ab2835b52a55d28932bafb24f093b78ebe2e00 |
|
MD5 | fe831ff23312f60718f6f131b33b249b |
|
BLAKE2b-256 | d9a4f774c419430c29dbf9be69634fb083c040faa13f9a147a23302f340f8fcd |