Skip to main content

bytepiece-rs Python binding

Project description

rs-bytepiece

Install

pip install rs_bytepiece

Usage

from rs_bytepiece import Tokenizer

tokenizer = Tokenizer()
# a custom model
tokenizer = Tokenizer("/path/to/model")
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)

Performance

The performance is a bit faster than the original implementation. I've tested (on my M2 16G) the《鲁迅全集》which has 625890 chars. The time unit is millisecond.

length jieba aho_py aho_cy aho_rs
100 17062.12 1404.37 564.31 112.05
1000 17104.38 1424.6 573.32 105.67
10000 17432.58 1429.0 574.93 109.06
100000 17228.17 1401.01 574.5 112.14
625890 17305.95 1419.79 567.78 111.3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rs_bytepiece-0.2.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

rs_bytepiece-0.2.0-cp37-abi3-win_amd64.whl (3.8 MB view details)

Uploaded CPython 3.7+ Windows x86-64

rs_bytepiece-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

rs_bytepiece-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.7+ macOS 10.7+ x86-64

File details

Details for the file rs_bytepiece-0.2.0.tar.gz.

File metadata

  • Download URL: rs_bytepiece-0.2.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.2.3

File hashes

Hashes for rs_bytepiece-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9ff28407062eed88d57d4f3401e56eee5561e0ec99cb0127004c0528fc0dc710
MD5 8fe377f2ac4459de798de53f80f23e21
BLAKE2b-256 88f3f8ea10c56a0a46b8a52b10cbf0ff49d462aea9dbb908ba39ca022b0941ea

See more details on using hashes here.

Provenance

File details

Details for the file rs_bytepiece-0.2.0-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.0-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 29e11d660b96b165772c33fcd4c2eb63371ac3090e1629d0d2a3f882bd7fb76c
MD5 f6aa9062e601bdd41b71f1bec299721f
BLAKE2b-256 c1c688f4ecfb017b0b7fa360341b91ad8779b1ff76a859372855e7a3c4470388

See more details on using hashes here.

Provenance

File details

Details for the file rs_bytepiece-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec03c772b5d807b1bf89b6e12367b59c4abda8f9594720df726da2d0262baf3a
MD5 504777a7d6e57f305f404155e7174f19
BLAKE2b-256 bd89b9e8e16ddce412b9569c39daaf5f7d74fb0a3bd327c17c5c7c3f20b20080

See more details on using hashes here.

Provenance

File details

Details for the file rs_bytepiece-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 bb24a0ec9196b00719de994b99f4513fd22f3f289345d13daca538c5e5a9a7c1
MD5 32f28532d24932895afad8ec43a33f9e
BLAKE2b-256 c7ff439036e944efa076ea4a2de2fca88d3b9e1200ab23b77b5ebce7ca744cfe

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page