Skip to main content

No project description provided

Project description

rtok

A Python tokenizer for LLMs using GitHub's linear BPE implementation.

[rtok]
ebook (90k) 0.12907366300350986
a*100 7.582298712804914e-05
a*1000 0.00013318200944922864
a*10000 0.0013674889924004674
a*100000 0.01401260400598403

[tiktoken]
ebook (90k) 0.23613008800020907
a*100 0.00018489900685381144
a*1000 0.003490732007776387
a*10000 0.3407805879978696
a*100000 33.41563105300884

API

rtok.openai.get_o200k_base() -> Encoder
rtok.openai.get_cl100k_base() -> Encoder
Encoder.count(str)
Encoder.count_till_limit(str, limit: int) -> Optional[int]
Encoder.encode(str) -> [int]
Encoder.decode([int]) -> str

Encoder.encode and Encoder.decode are compatible with tiktoken. See test.py. Encoder.count_till_limit() returns None if the count exceeds the limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtok-0.1.0.tar.gz (92.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtok-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (25.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file rtok-0.1.0.tar.gz.

File metadata

  • Download URL: rtok-0.1.0.tar.gz
  • Upload date:
  • Size: 92.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for rtok-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e35174a23039851206611d463f1a959a7fc8482a6cea0501576a24331fc078a9
MD5 39bf695938c35032688ac15c61c33f88
BLAKE2b-256 2f83be406291a8bd02e95d206d01aa76b56d1f07094fc10ee6344e241fc321b4

See more details on using hashes here.

File details

Details for the file rtok-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rtok-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6eb442fed093a0751db32b8557ea6e3868df66f6b0e484ef48a6b000ba6b6c82
MD5 090fdd20e77202e7bd469f6e59b3ac1a
BLAKE2b-256 6a35c51dbe1274a264fbe88f1ad4cc3df7a46cd4badd8d2c653299963a88d206

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page