A simple sentencepiece encoder and decoder without any dependency.
Project description
simple-sentencepiece
A simple sentencepiece encoder and decoder.
Note: This is not a new sentencepiece toolkit, it just uses google's sentencepiece model as input and encode the string to ids/pieces or decode the ids to string. The advantage of this tool is that it doesn't have any dependency (no protobuf), so it will be easier to integrate it into a C++ project.
Installation
pip install simple-sentencepiece
Usage
The usage is very similar to sentencepiece, it also has encode
and decode
interface.
from ssentencepiece import Ssentencepiece
# you can get bpe.vocab from a trained bpe model, see google's sentencepiece for details
ssp = Ssentencepiece("/path/to/bpe.vocab")
# output ids
ids = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"])
# output string pieces
pieces = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"], out_type=str)
# decode
res = ssp.decode(ids)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file simple-sentencepiece-0.2.tar.gz
.
File metadata
- Download URL: simple-sentencepiece-0.2.tar.gz
- Upload date:
- Size: 351.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc641e23fd404dc8f4587f6f420cc3446d8cc28780f2f506564da36a39ce06fe |
|
MD5 | d5afb57fb4bf06ed2d6607b0de9153f4 |
|
BLAKE2b-256 | e244af624aecaa44f080b1e45049d0e497cd93abc1612214078953f960509d3e |
File details
Details for the file simple_sentencepiece-0.2-cp312-cp312-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 237.7 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a223f50fee07a47e873a7c0c8d8ffd7fe61871fffeca6ad8db3a75bf6af76d6 |
|
MD5 | 1bd34f05cc8012c0f8bc2ad203600c19 |
|
BLAKE2b-256 | 4c7d72db48ed6b3fe3e677f3e01303392464b3ed4ca0a0e08636699994b39d32 |
File details
Details for the file simple_sentencepiece-0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 165.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37c98ca83081f90e5a5d9846cffbad1e2c345c8ab161dcc4c2d4b2f9dcf742ed |
|
MD5 | 0dd694dc4c62a32fbeff082820320530 |
|
BLAKE2b-256 | 67cfa8c48169d23ff1f8df0b919580b2b1682bc024dd63e04099826f0d7d0b89 |
File details
Details for the file simple_sentencepiece-0.2-cp312-cp312-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp312-cp312-macosx_10_9_x86_64.whl
- Upload date:
- Size: 120.8 kB
- Tags: CPython 3.12, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c20fc8cb97db80e0c8088cdaec8cb224b9b83c333dc3fd70a0e7a624c312dd23 |
|
MD5 | f92aa5413b5ddaae2d6a71b7d55a5730 |
|
BLAKE2b-256 | 12d751c4b302396b978f3eb1f505b07b63325e54511ce12e26096c1e21f4eae8 |
File details
Details for the file simple_sentencepiece-0.2-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 237.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3633afa559030ddabfe2f056aa3a58381004c5f8101bd3f426c732fa1228333 |
|
MD5 | 594971eb2465a18d4c26791622095b30 |
|
BLAKE2b-256 | 0df69929d26d8c82867841433b181d78e31c9e1d4c72accee6dab0449c835495 |
File details
Details for the file simple_sentencepiece-0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 165.3 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c40131778a12734e34ed7790902250296659127b67876c741e56e91056c529d7 |
|
MD5 | 65816dfe7376dd29435698cfd77e6d8f |
|
BLAKE2b-256 | e256ab691fa17dca7a487ea2707552c3c60703119fc4a9c03f6aee7b2207928d |
File details
Details for the file simple_sentencepiece-0.2-cp311-cp311-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp311-cp311-macosx_10_9_x86_64.whl
- Upload date:
- Size: 119.3 kB
- Tags: CPython 3.11, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46e1ca997d7638a4659abc3ca0d38683b02fb6710e1e1989710e36fc1e3630a2 |
|
MD5 | 896d214ca00c836ca101826ef041f67f |
|
BLAKE2b-256 | de62aeb080bf26e1edf503e58ada378d9c229a3248e48a5635cb88d14b948ba7 |
File details
Details for the file simple_sentencepiece-0.2-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 237.4 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f88a1d9e19f351338b34fff96a6632a768bd5739ba28d1728cf028e8521eeb1 |
|
MD5 | 51279eecd9bf095da5a46911086068e2 |
|
BLAKE2b-256 | 763f7c029b952f0926ab99b748d3510da0a26c1d139dc64d80417aec778fc671 |
File details
Details for the file simple_sentencepiece-0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 165.4 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c627f734fa3870d6c95eb509fbe0e255dcbf25b04d05ee63178cec6b55a8220 |
|
MD5 | 9789ad29f592174bd23e5128b749adbc |
|
BLAKE2b-256 | b4f39a03d9878994c34c40bd3278662c4d9bbe00326fcf66b76d2c60a784801f |
File details
Details for the file simple_sentencepiece-0.2-cp310-cp310-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp310-cp310-macosx_10_9_x86_64.whl
- Upload date:
- Size: 119.3 kB
- Tags: CPython 3.10, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66948e6e9760606e3615ce83e30bb693ef115a99bc131019aa6f4354eda34378 |
|
MD5 | 7355ffb7ce5c0d2a6c4c076a54b679c4 |
|
BLAKE2b-256 | 36246d915766d0c7a97b17ac51345a42b3e8b1b51011ca447b875523819f7e94 |
File details
Details for the file simple_sentencepiece-0.2-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 237.4 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a925ae1b272f3ad3b1b9dc570b7b134e74bf31156f3ddc7d2547a34f77d18d81 |
|
MD5 | 066a62e2cd7ff2b6146fe4ffef5f70a4 |
|
BLAKE2b-256 | a37e92d58aa6c94463583d5792bc38fccff16f6338916691abe9e321c04658cf |
File details
Details for the file simple_sentencepiece-0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 165.6 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5720943cc66ab0977f33e28dd9ef28e0b0ed99141aafb1c4ea75bbe04def5cc1 |
|
MD5 | 1adcdea51a7a1470b8c0111d89b04d8c |
|
BLAKE2b-256 | 4e92d144e2bbdb58a026edb7f0b9b2c4fde2f61c03d3f507eea5dade6682e205 |
File details
Details for the file simple_sentencepiece-0.2-cp39-cp39-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp39-cp39-macosx_10_9_x86_64.whl
- Upload date:
- Size: 119.5 kB
- Tags: CPython 3.9, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55f58f93c708546b7f9a225195d710274e209f134988b3b5104fc25167bfcd68 |
|
MD5 | ec26ceb6d073ceb6c101ffda69c76428 |
|
BLAKE2b-256 | 67e693ffd6a790a7878cd4b1b9d009eddea9c52687f367394c95b8cf46edc144 |
File details
Details for the file simple_sentencepiece-0.2-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 237.1 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8a9653eb2ade0df9adb3f6c2f0aac8e99e3b8be6b21e8f81c6fcb53fde6f1a7 |
|
MD5 | 2445a5bf43c300c9a4838b0eead68617 |
|
BLAKE2b-256 | 929405e41961b7de064ac9182e3cd4a83f21d836f2ecb8757aec2cb6c82374d9 |
File details
Details for the file simple_sentencepiece-0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 165.4 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b5e48d8ddea4282a397b1e3ba9300d047afdc4d436108414c7581ed64322ab9 |
|
MD5 | 82d61652ccc576b7555254d3eebb9da1 |
|
BLAKE2b-256 | 656bd2396ca0170e56311fd9041d5ac897b358a6f4f049d2d29705b73751e4ff |
File details
Details for the file simple_sentencepiece-0.2-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 119.4 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c722a5ff90a316454f7ca10d1f8d690f82e6413d9d1318c6313c8ca9ed23682 |
|
MD5 | 4e9d7a48bf2b6afde8c2c75e776ff123 |
|
BLAKE2b-256 | de6caac38630420edd39ee440b022e03003697f065b0efa1af9becd5ffae8867 |
File details
Details for the file simple_sentencepiece-0.2-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 237.6 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e7a2a240eb51582a10d2379bbc37e570b1d69a3c0d80b62a601cc51d515fd07 |
|
MD5 | ce957e6822b99041daadb9f30666de8f |
|
BLAKE2b-256 | ace8f6d903cb1328a7f9a524f3d54dbb1bff1c3a520c7e39d51c9ca52d0fd374 |
File details
Details for the file simple_sentencepiece-0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 168.3 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03363ed0f2c726877f53b816e4e4b42dc213354772641ccb9d46000cdd150f6f |
|
MD5 | b23fb591f29840ad048e5a607e3b06c6 |
|
BLAKE2b-256 | 83e512f659b2de0c3bc775cc34f316cccdc47ee0e0ecfdfe16d0ada8bc744988 |
File details
Details for the file simple_sentencepiece-0.2-cp37-cp37m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: simple_sentencepiece-0.2-cp37-cp37m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 119.1 kB
- Tags: CPython 3.7m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 433095e65ab95a9182b074c63c56495e0e11d3851d6ca577e9fada3e1b66f6eb |
|
MD5 | 9904bf55b21d8e265058943265b54e5b |
|
BLAKE2b-256 | 1f059b9e231c6fc965291c480ee03601625b8f1476a5abcea6679521753a90f2 |