Skip to main content

bytepiece-rs Python binding

Project description

rs-bytepiece

Install

pip install rs_bytepiece

Usage

from rs_bytepiece import Tokenizer

tokenizer = Tokenizer()
# a custom model
tokenizer = Tokenizer("/path/to/model")
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)

Performance

The performance is a bit faster than the original implementation. I've tested (on my M2 16G) the《鲁迅全集》which has 625890 chars. The time unit is millisecond.

length jieba aho_py aho_cy aho_rs
100 17062.12 1404.37 564.31 112.94
1000 17104.38 1424.6 573.32 113.18
10000 17432.58 1429.0 574.93 110.03
100000 17228.17 1401.01 574.5 110.44
625890 17305.95 1419.79 567.78 108.54

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rs_bytepiece-0.2.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

rs_bytepiece-0.2.2-cp37-abi3-win_amd64.whl (3.8 MB view details)

Uploaded CPython 3.7+ Windows x86-64

rs_bytepiece-0.2.2-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

rs_bytepiece-0.2.2-cp37-abi3-macosx_10_7_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.7+ macOS 10.7+ x86-64

File details

Details for the file rs_bytepiece-0.2.2.tar.gz.

File metadata

  • Download URL: rs_bytepiece-0.2.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.2.3

File hashes

Hashes for rs_bytepiece-0.2.2.tar.gz
Algorithm Hash digest
SHA256 f86206004808f118d7581fe314387c6d0c45566cd34fda346c439c3392e912cd
MD5 5b7c4bd2fb2e597f4088d5b1a33bfb9f
BLAKE2b-256 8a22b6bbac87677550e256e049e97b2cc4a75aa33bd12bb81ac7c4709d407178

See more details on using hashes here.

File details

Details for the file rs_bytepiece-0.2.2-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.2-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2e6a9f8b78bc24d4856b00240c82392ebe6008b110ba482e22c00f48f81b4e6c
MD5 a20df1d801d72386db90a6e23dfc9efe
BLAKE2b-256 6230dea969abe55a7cf936c69b32b5ddda7eb09687300be9fa8923535da0d9f6

See more details on using hashes here.

File details

Details for the file rs_bytepiece-0.2.2-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.2-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2405a38e0a03985fabeb025eae65fcafcf5aae7c5f9b065c10cc17b72ae5e723
MD5 52a51aebe3216eadb80bafb8150db11a
BLAKE2b-256 14610a22cb90c845829383640d2a9ef17ab0286e9cef9189b7346194b5df2b72

See more details on using hashes here.

File details

Details for the file rs_bytepiece-0.2.2-cp37-abi3-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for rs_bytepiece-0.2.2-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 14338a8c8573df2ac4dc83567a58722d18984f678ce3426237d58e689288df1d
MD5 485620da2afd64f4758720e3ab7704aa
BLAKE2b-256 4863809ac38242cf82098144777c8f82a94290d15aa04f9c45dbb57da7a78be8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page