A simple sentencepiece encoder and decoder without any dependency.
Project description
simple-sentencepiece
A simple sentencepiece encoder and decoder.
Note: This is not a new sentencepiece toolkit, it just uses google's sentencepiece model as input and encode the string to ids/pieces or decode the ids to string. The advantage of this tool is that it doesn't have any dependency (no protobuf), so it will be easier to integrate it into a C++ project.
Installation
pip install simple-sentencepiece
Usage
The usage is very similar to sentencepiece, it also has encode
and decode
interface.
from ssentencepiece import Ssentencepiece
# you can get bpe.vocab from a trained bpe model, see google's sentencepiece for details
ssp = Ssentencepiece("/path/to/bpe.vocab")
# output ids
ids = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"])
# output string pieces
pieces = ssp.encode(["HELLO WORLD", "LOVE AND PIECE"], out_type=str)
# decode
res = ssp.decode(ids)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
simple-sentencepiece-0.2.tar.gz
(351.7 kB
view hashes)
Built Distributions
Close
Hashes for simple_sentencepiece-0.2-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a223f50fee07a47e873a7c0c8d8ffd7fe61871fffeca6ad8db3a75bf6af76d6 |
|
MD5 | 1bd34f05cc8012c0f8bc2ad203600c19 |
|
BLAKE2b-256 | 4c7d72db48ed6b3fe3e677f3e01303392464b3ed4ca0a0e08636699994b39d32 |
Close
Hashes for simple_sentencepiece-0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37c98ca83081f90e5a5d9846cffbad1e2c345c8ab161dcc4c2d4b2f9dcf742ed |
|
MD5 | 0dd694dc4c62a32fbeff082820320530 |
|
BLAKE2b-256 | 67cfa8c48169d23ff1f8df0b919580b2b1682bc024dd63e04099826f0d7d0b89 |
Close
Hashes for simple_sentencepiece-0.2-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c20fc8cb97db80e0c8088cdaec8cb224b9b83c333dc3fd70a0e7a624c312dd23 |
|
MD5 | f92aa5413b5ddaae2d6a71b7d55a5730 |
|
BLAKE2b-256 | 12d751c4b302396b978f3eb1f505b07b63325e54511ce12e26096c1e21f4eae8 |
Close
Hashes for simple_sentencepiece-0.2-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3633afa559030ddabfe2f056aa3a58381004c5f8101bd3f426c732fa1228333 |
|
MD5 | 594971eb2465a18d4c26791622095b30 |
|
BLAKE2b-256 | 0df69929d26d8c82867841433b181d78e31c9e1d4c72accee6dab0449c835495 |
Close
Hashes for simple_sentencepiece-0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c40131778a12734e34ed7790902250296659127b67876c741e56e91056c529d7 |
|
MD5 | 65816dfe7376dd29435698cfd77e6d8f |
|
BLAKE2b-256 | e256ab691fa17dca7a487ea2707552c3c60703119fc4a9c03f6aee7b2207928d |
Close
Hashes for simple_sentencepiece-0.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46e1ca997d7638a4659abc3ca0d38683b02fb6710e1e1989710e36fc1e3630a2 |
|
MD5 | 896d214ca00c836ca101826ef041f67f |
|
BLAKE2b-256 | de62aeb080bf26e1edf503e58ada378d9c229a3248e48a5635cb88d14b948ba7 |
Close
Hashes for simple_sentencepiece-0.2-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f88a1d9e19f351338b34fff96a6632a768bd5739ba28d1728cf028e8521eeb1 |
|
MD5 | 51279eecd9bf095da5a46911086068e2 |
|
BLAKE2b-256 | 763f7c029b952f0926ab99b748d3510da0a26c1d139dc64d80417aec778fc671 |
Close
Hashes for simple_sentencepiece-0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c627f734fa3870d6c95eb509fbe0e255dcbf25b04d05ee63178cec6b55a8220 |
|
MD5 | 9789ad29f592174bd23e5128b749adbc |
|
BLAKE2b-256 | b4f39a03d9878994c34c40bd3278662c4d9bbe00326fcf66b76d2c60a784801f |
Close
Hashes for simple_sentencepiece-0.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66948e6e9760606e3615ce83e30bb693ef115a99bc131019aa6f4354eda34378 |
|
MD5 | 7355ffb7ce5c0d2a6c4c076a54b679c4 |
|
BLAKE2b-256 | 36246d915766d0c7a97b17ac51345a42b3e8b1b51011ca447b875523819f7e94 |
Close
Hashes for simple_sentencepiece-0.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a925ae1b272f3ad3b1b9dc570b7b134e74bf31156f3ddc7d2547a34f77d18d81 |
|
MD5 | 066a62e2cd7ff2b6146fe4ffef5f70a4 |
|
BLAKE2b-256 | a37e92d58aa6c94463583d5792bc38fccff16f6338916691abe9e321c04658cf |
Close
Hashes for simple_sentencepiece-0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5720943cc66ab0977f33e28dd9ef28e0b0ed99141aafb1c4ea75bbe04def5cc1 |
|
MD5 | 1adcdea51a7a1470b8c0111d89b04d8c |
|
BLAKE2b-256 | 4e92d144e2bbdb58a026edb7f0b9b2c4fde2f61c03d3f507eea5dade6682e205 |
Close
Hashes for simple_sentencepiece-0.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55f58f93c708546b7f9a225195d710274e209f134988b3b5104fc25167bfcd68 |
|
MD5 | ec26ceb6d073ceb6c101ffda69c76428 |
|
BLAKE2b-256 | 67e693ffd6a790a7878cd4b1b9d009eddea9c52687f367394c95b8cf46edc144 |
Close
Hashes for simple_sentencepiece-0.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8a9653eb2ade0df9adb3f6c2f0aac8e99e3b8be6b21e8f81c6fcb53fde6f1a7 |
|
MD5 | 2445a5bf43c300c9a4838b0eead68617 |
|
BLAKE2b-256 | 929405e41961b7de064ac9182e3cd4a83f21d836f2ecb8757aec2cb6c82374d9 |
Close
Hashes for simple_sentencepiece-0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b5e48d8ddea4282a397b1e3ba9300d047afdc4d436108414c7581ed64322ab9 |
|
MD5 | 82d61652ccc576b7555254d3eebb9da1 |
|
BLAKE2b-256 | 656bd2396ca0170e56311fd9041d5ac897b358a6f4f049d2d29705b73751e4ff |
Close
Hashes for simple_sentencepiece-0.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c722a5ff90a316454f7ca10d1f8d690f82e6413d9d1318c6313c8ca9ed23682 |
|
MD5 | 4e9d7a48bf2b6afde8c2c75e776ff123 |
|
BLAKE2b-256 | de6caac38630420edd39ee440b022e03003697f065b0efa1af9becd5ffae8867 |
Close
Hashes for simple_sentencepiece-0.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e7a2a240eb51582a10d2379bbc37e570b1d69a3c0d80b62a601cc51d515fd07 |
|
MD5 | ce957e6822b99041daadb9f30666de8f |
|
BLAKE2b-256 | ace8f6d903cb1328a7f9a524f3d54dbb1bff1c3a520c7e39d51c9ca52d0fd374 |
Close
Hashes for simple_sentencepiece-0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03363ed0f2c726877f53b816e4e4b42dc213354772641ccb9d46000cdd150f6f |
|
MD5 | b23fb591f29840ad048e5a607e3b06c6 |
|
BLAKE2b-256 | 83e512f659b2de0c3bc775cc34f316cccdc47ee0e0ecfdfe16d0ada8bc744988 |
Close
Hashes for simple_sentencepiece-0.2-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 433095e65ab95a9182b074c63c56495e0e11d3851d6ca577e9fada3e1b66f6eb |
|
MD5 | 9904bf55b21d8e265058943265b54e5b |
|
BLAKE2b-256 | 1f059b9e231c6fc965291c480ee03601625b8f1476a5abcea6679521753a90f2 |