Skip to main content

A simple sentencepiece encoder and decoder without any dependency.

Project description

simple-sentencepiece

A simple sentencepiece encoder and decoder.

Note: This is not a new sentencepiece toolkit, it just uses google's sentencepiece model as input and encode the string to ids/pieces or decode the ids to string. The advantage of this tool is that it doesn't have any dependency (no protobuf), so it will be easier to integrate it into a C++ project.

Installation

pip install simple-sentencepiece

Usage

The usage is very similar to sentencepiece, it also has encode and decode interface.

from ssentencepiece import Ssentencepiece

# you can get bpe.vocab from a trained bpe model, see google's sentencepiece for details
ssp = Ssentencepiece("/path/to/bpe.vocab")

# output ids
ids = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"])

# output string pieces
pieces = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"], out_type=str)

# decode
res = ssp.decode(ids)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple-sentencepiece-0.2.tar.gz (351.7 kB view hashes)

Uploaded Source

Built Distributions

simple_sentencepiece-0.2-cp312-cp312-win_amd64.whl (237.7 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

simple_sentencepiece-0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.1 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp312-cp312-macosx_10_9_x86_64.whl (120.8 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

simple_sentencepiece-0.2-cp311-cp311-win_amd64.whl (237.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

simple_sentencepiece-0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp311-cp311-macosx_10_9_x86_64.whl (119.3 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

simple_sentencepiece-0.2-cp310-cp310-win_amd64.whl (237.4 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

simple_sentencepiece-0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.4 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp310-cp310-macosx_10_9_x86_64.whl (119.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

simple_sentencepiece-0.2-cp39-cp39-win_amd64.whl (237.4 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

simple_sentencepiece-0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.6 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp39-cp39-macosx_10_9_x86_64.whl (119.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

simple_sentencepiece-0.2-cp38-cp38-win_amd64.whl (237.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

simple_sentencepiece-0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165.4 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp38-cp38-macosx_10_9_x86_64.whl (119.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

simple_sentencepiece-0.2-cp37-cp37m-win_amd64.whl (237.6 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

simple_sentencepiece-0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.3 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

simple_sentencepiece-0.2-cp37-cp37m-macosx_10_9_x86_64.whl (119.1 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page