Skip to main content

Python wrapper of Vaporetto tokenizer

Project description

🐍 python-vaporetto 🛥

Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

Installation

To use Vaporetto, run the following command:

$ pip install vaporetto

Or you can also build from the source:

$ python -m venv .env
$ source .env/bin/activate
$ pip install maturin
$ maturin develop -r

Example Usage

python-vaporetto does not contain model files. To perform tokenization, follow the document of Vaporetto to download distribution models or train your own models before hand.

# Import vaporetto module
import vaporetto

# Load the model file
with open('path/to/model.zst', 'rb') as fp:
    model = fp.read()

# Create an instance of the Vaporetto
tokenizer = vaporetto.Vaporetto(model, predict_tags = True)

# Tokenize
tokenizer.tokenize_to_string('まぁ社長は火星猫だ')
#=> 'まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ'

tokens = tokenizer.tokenize('まぁ社長は火星猫だ')
len(tokens)
#=> 6
tokens[0].surface()
#=> 'まぁ'
tokens[0].tag(0)
#=> '名詞'
tokens[0].tag(1)
#=> 'マー'
[token.surface() for token in tokens]
#=> ['まぁ', '社長', 'は', '火星', '猫', 'だ']]

You can also use KyTea's models as follows:

with open('path/to/jp-0.4.7-5.mod', 'rb') as fp:
    model = fp.read()

tokenizer = vaporetto.Vaporetto.create_from_kytea_model(model)

Note: Vaporetto does not support tag prediction with KyTea's models.

Documentation

Use the help function to show the API reference.

import vaporetto
help(vaporetto)

Speed Comparison

License

Licensed under either of

at your option.

Disclaimer

This software is developed by LegalForce, Inc., but not an officially supported LegalForce product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

vaporetto-0.1.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp310-none-win_amd64.whl (329.4 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

vaporetto-0.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (887.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

vaporetto-0.1.0-cp39-none-win_amd64.whl (329.6 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

vaporetto-0.1.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (888.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

vaporetto-0.1.0-cp38-none-win_amd64.whl (329.4 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

vaporetto-0.1.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (887.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

vaporetto-0.1.0-cp37-none-win_amd64.whl (329.3 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

vaporetto-0.1.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

vaporetto-0.1.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (887.1 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page