bytepiece-rs Python binding
Project description
rs-bytepiece
Install
pip install rs_bytepiece
Usage
from rs_bytepiece import Tokenizer
tokenizer = Tokenizer()
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)
Performance
The performance is a bit faster than the original implementation. I've tested the《鲁迅全集》which has 625890 chars. The time unit is millisecond.
length | jieba | aho_py | aho_cy | aho_rs |
---|---|---|---|---|
100 | 17062.12 | 1404.37 | 564.31 | 299.09 |
1000 | 17104.38 | 1424.6 | 573.32 | 281.84 |
10000 | 17432.58 | 1429.0 | 574.93 | 293.16 |
100000 | 17228.17 | 1401.01 | 574.5 | 280.81 |
625890 | 17305.95 | 1419.79 | 567.78 | 282.35 |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rs_bytepiece-0.1.0.tar.gz
(14.6 kB
view details)
Built Distributions
File details
Details for the file rs_bytepiece-0.1.0.tar.gz
.
File metadata
- Download URL: rs_bytepiece-0.1.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93e434129cd5bf93bdc56771a5bbdca6e775b780e39a0e992bd59d7b378a9083 |
|
MD5 | 128737282102f92368900d8e1d5d4213 |
|
BLAKE2b-256 | b38f0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa |
Provenance
File details
Details for the file rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl
.
File metadata
- Download URL: rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 020a47804007a430627016eeb025fe7a6fad18af5704b63fa40408b1ea706538 |
|
MD5 | 50f9cfd9ff1f78e9ce16e3374ec7fbdc |
|
BLAKE2b-256 | 892f50b11b57eea11225e4f19bd493f7a454eed42aa91ed810957a345c3130b9 |
Provenance
File details
Details for the file rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.4 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 404a7aa84ff603b9d4554d30ce8a892249be09b0e0ef1a10ff593367d89ac6a6 |
|
MD5 | 4a9de2f2bbe80e54fd7bc3f92058ebee |
|
BLAKE2b-256 | 91d9cb576d4bbf36b9df2d2fa74cce06b94fa65815138ad7e5fc4da7fb491ac7 |
Provenance
File details
Details for the file rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.7+, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.2.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65f88bb0878bae7c5add49dc5077116428e46edc89dbd25b0ce49e098df4981f |
|
MD5 | 144167070d471e4bad2ab650eb3d89dc |
|
BLAKE2b-256 | 9ba9989fefc126fc658ab52925ca89ba7d88a2528bd61f6cfd68e1ff10d24f56 |