bytepiece-rs Python binding
Project description
rs-bytepiece
Install
pip install rs_bytepiece
Usage
from rs_bytepiece import Tokenizer
tokenizer = Tokenizer()
# a custom model
tokenizer = Tokenizer("/path/to/model")
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)
Performance
The performance is a bit faster than the original implementation. I've tested (on my M2 16G) the《鲁迅全集》which has 625890 chars. The time unit is millisecond.
length | jieba | aho_py | aho_cy | aho_rs |
---|---|---|---|---|
100 | 17062.12 | 1404.37 | 564.31 | 112.05 |
1000 | 17104.38 | 1424.6 | 573.32 | 105.67 |
10000 | 17432.58 | 1429.0 | 574.93 | 109.06 |
100000 | 17228.17 | 1401.01 | 574.5 | 112.14 |
625890 | 17305.95 | 1419.79 | 567.78 | 111.3 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rs_bytepiece-0.2.0.tar.gz
(1.2 MB
view details)
Built Distributions
File details
Details for the file rs_bytepiece-0.2.0.tar.gz
.
File metadata
- Download URL: rs_bytepiece-0.2.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ff28407062eed88d57d4f3401e56eee5561e0ec99cb0127004c0528fc0dc710 |
|
MD5 | 8fe377f2ac4459de798de53f80f23e21 |
|
BLAKE2b-256 | 88f3f8ea10c56a0a46b8a52b10cbf0ff49d462aea9dbb908ba39ca022b0941ea |
Provenance
File details
Details for the file rs_bytepiece-0.2.0-cp37-abi3-win_amd64.whl
.
File metadata
- Download URL: rs_bytepiece-0.2.0-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29e11d660b96b165772c33fcd4c2eb63371ac3090e1629d0d2a3f882bd7fb76c |
|
MD5 | f6aa9062e601bdd41b71f1bec299721f |
|
BLAKE2b-256 | c1c688f4ecfb017b0b7fa360341b91ad8779b1ff76a859372855e7a3c4470388 |
Provenance
File details
Details for the file rs_bytepiece-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: rs_bytepiece-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.5 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec03c772b5d807b1bf89b6e12367b59c4abda8f9594720df726da2d0262baf3a |
|
MD5 | 504777a7d6e57f305f404155e7174f19 |
|
BLAKE2b-256 | bd89b9e8e16ddce412b9569c39daaf5f7d74fb0a3bd327c17c5c7c3f20b20080 |
Provenance
File details
Details for the file rs_bytepiece-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: rs_bytepiece-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.7+, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb24a0ec9196b00719de994b99f4513fd22f3f289345d13daca538c5e5a9a7c1 |
|
MD5 | 32f28532d24932895afad8ec43a33f9e |
|
BLAKE2b-256 | c7ff439036e944efa076ea4a2de2fca88d3b9e1200ab23b77b5ebce7ca744cfe |