Skip to main content

No project description provided

Project description

🪙 toktokenizer

toktokenizer is a BPE tokenizer implemented in rust and exposed in python using pyo3 bindings.

import toktokenizer as tok
bpe = tok.BPETokenizer.from_pretrained("wikibpe.json")
assert bpe.decode(bpe.encode("rust is pretty fun 🦀"))

Install toktokenizer from PyPI or from source

pip install toktokenizer

Performance

tok: 16.18MB/s tokenizers: 4.89MB/s tiktoken: 22.98MB/s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toktokenizer-0.1.0.tar.gz (634.1 kB view hashes)

Uploaded Source

Built Distributions

toktokenizer-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (307.2 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

toktokenizer-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334.8 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page