A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.
Project description
runtoken
A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.
Built from scratch in Rust with Python bindings via PyO3. Produces identical output to tiktoken — same token IDs, same order, every time.
Why?
If you're building an LLM gateway, proxy, or any system that processes tokens at scale, tokenization speed matters. Every API request needs token counting for:
- Cost estimation & billing
- Rate limiting per user
- Context window management
- Smart routing (pick the cheapest model that fits)
tiktoken is good. runtoken is faster.
Benchmarks
Apples-to-apples comparison — both called as Python packages, same machine, same text:
Encode (full token IDs)
| Input | tiktoken | runtoken | Speedup |
|---|---|---|---|
| Short text (29 chars, 9 tokens) | 1.3M tok/s | 24.6M tok/s | 19x |
| Medium text (1050 chars, 511 tokens) | 2.5M tok/s | 68.8M tok/s | 27x |
| Code (1200 chars, 380 tokens) | 1.5M tok/s | 63.5M tok/s | 44x |
| Long English (4500 chars, 1001 tokens) | 2.5M tok/s | 73.6M tok/s | 29x |
| Long code (5600 chars, 2160 tokens) | 1.5M tok/s | 88.2M tok/s | 59x |
| Unicode (500 chars, 420 tokens) | 4.2M tok/s | 89.2M tok/s | 21x |
Count-only (the gateway use case)
| Input | tiktoken | runtoken | Speedup |
|---|---|---|---|
| Medium text | 2.5M tok/s | 940M tok/s | 381x |
| Long English | 2.6M tok/s | 1.4B tok/s | 538x |
| Long code | 1.5M tok/s | 2.6B tok/s | 1750x |
Benchmarked on a 2-vCPU cloud instance. Count-only benefits from multi-level caching (text-level + chunk-level LRU).
Correctness
| Test Suite | Tests | Result |
|---|---|---|
| Deep correctness (41 strings × 3 encodings) | 123 | ✅ 100% |
| Stress test (up to 64K tokens) | 27 | ✅ 100% |
| PDF documents (academic papers, 65K tokens) | 54 | ✅ 100% |
| Total | 204 | 0 mismatches |
Every test compares exact token IDs — not just counts, but the same numbers in the same order.
Installation
pip install runtoken
From source
git clone https://github.com/Thibault00/runtoken.git
cd runtoken
pip install maturin
maturin develop --release
Usage
Python
import runtoken
# Get a tokenizer by encoding name (same API as tiktoken)
enc = runtoken.get_encoding("cl100k_base")
# Encode text to token IDs
tokens = enc.encode("Hello, world!")
# [9906, 11, 1917, 0]
# Count tokens
count = enc.count("Hello, world!")
# 4
# Decode back to text
text = enc.decode([9906, 11, 1917, 0])
# "Hello, world!"
# Get tokenizer by model name
enc = runtoken.encoding_for_model("gpt-4o") # → o200k_base
enc = runtoken.encoding_for_model("gpt-4") # → cl100k_base
enc = runtoken.encoding_for_model("claude") # → cl100k_base
# Quick one-liner
runtoken.count("Hello!", model="gpt-4o")
# 2
Rust
use runtoken::Tokenizer;
let tokenizer = Tokenizer::new("cl100k_base").unwrap();
let tokens = tokenizer.encode("Hello, world!");
let count = tokenizer.count("Hello, world!");
let text = tokenizer.decode(&tokens);
CLI
# Encode text
runtoken-cli encode "Hello, world!" cl100k_base
# Count tokens
runtoken-cli count "Hello, world!" o200k_base
# Read from stdin (for large texts)
cat myfile.txt | runtoken-cli count - cl100k_base
# Benchmark
runtoken-cli bench cl100k_base
Supported Encodings
| Encoding | Models | Vocab Size |
|---|---|---|
cl100k_base |
GPT-4, GPT-3.5-turbo, Claude | 100,256 |
o200k_base |
GPT-4o, o1, o3 | 200,019 |
p50k_base |
text-davinci-003, Codex | 50,281 |
Model → Encoding Mapping
| Model prefix | Encoding |
|---|---|
gpt-4o, o1, o3 |
o200k_base |
gpt-4, gpt-3.5, claude |
cl100k_base |
text-davinci, code-davinci |
p50k_base |
Architecture
src/
├── lib.rs # Tokenizer + TokenizerRegistry + multi-level caching
├── bpe.rs # Core BPE merge algorithm (tiktoken-compatible)
├── vocab.rs # Vocabulary loading (.tiktoken format)
├── regex.rs # Regex splitting per encoding
├── python.rs # PyO3 bindings
└── main.rs # CLI tool
~900 lines of Rust — that's the entire tokenizer. Key design decisions:
- Multi-level LRU cache: Text-level (hash → tokens) + chunk-level (bytes → tokens). Repeated text is a hash lookup.
- Precomputed rank tables: Single-byte and two-byte pair ranks as direct arrays — no HashMap overhead for the most common lookups.
- Inline chunk processing: Regex chunks are encoded inline without collecting into intermediate Vecs.
- tiktoken-style BPE merge: Tracks min_rank inline during merges, avoids priority queue overhead for small chunks.
How it works
BPE (Byte Pair Encoding) tokenization:
- Regex split: Split input text into chunks using encoding-specific regex patterns
- Byte-level merging: For each chunk, start with individual bytes and repeatedly merge the pair with the lowest rank (priority) in the vocabulary
- Token IDs: Map the final merged byte sequences to their vocabulary rank
runtoken uses the exact same regex patterns and vocabulary files as tiktoken, which is why the output is identical.
Contributing
# Clone and build
git clone https://github.com/Thibault00/runtoken.git
cd runtoken
cargo build --release
# Run Rust tests
cargo test
# Run correctness tests against tiktoken
pip install tiktoken
python tests/deep_correctness.py
python tests/stress_test.py
# Build Python package
pip install maturin
maturin develop --release
python tests/benchmark_python.py
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file runtoken-0.1.2.tar.gz.
File metadata
- Download URL: runtoken-0.1.2.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eebf5555a1d614f031080cc5019ad5f8752a809c54755684c374c0e6b819c18b
|
|
| MD5 |
fb898b93c7cdfb2fb2ab441502483cbf
|
|
| BLAKE2b-256 |
96615ec425f0b3a4a41156a905254dfdffc89c4f390ac1d986390504879765d3
|
File details
Details for the file runtoken-0.1.2-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: runtoken-0.1.2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9c94a3bb9700b903b9110da5ae2e9d2c799d05e6fa23a569b39d6d1850b5954
|
|
| MD5 |
89b13c193eb86e417bc5f75309b79c66
|
|
| BLAKE2b-256 |
8f4105dd0059eb6c2922d7d001939cb661255201fe5d514fe55a74d8a4a3f971
|
File details
Details for the file runtoken-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: runtoken-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.9 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31206cff14472391dae5e6a66a66ee7b6a615a6e546101b983d2694279949478
|
|
| MD5 |
a48fc34cda9e292c61431d1a485db6a5
|
|
| BLAKE2b-256 |
dc00075052a7df61c72abd7709164782872252da938bb9cdf80fda05adadd49f
|
File details
Details for the file runtoken-0.1.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: runtoken-0.1.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29c12f9b5c2b95584efdf2e1ac93b48a75fe8be19e1057ddb2e6b57464f7d2e8
|
|
| MD5 |
2d3edfff9dd31ffeb1173b615bb57ee1
|
|
| BLAKE2b-256 |
5af66eb775d079b250b77aaff6f8fe2f6748f2bd31351cb08cbe78854da513f2
|
File details
Details for the file runtoken-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: runtoken-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa3f561640bcd30355b0295be0071fa2908ca1cd167326050b608f479ffaed76
|
|
| MD5 |
831b5f01c79aaf1d944b2743a409cd42
|
|
| BLAKE2b-256 |
6285ebd4c0a4b77329b2b07f92f1ab420cf741e47d09e838fef74d8615f2f168
|
File details
Details for the file runtoken-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: runtoken-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bde25681df9e7371369f1c774ea35c4f3b1c0bf222a0f99105102c54d1d10ba
|
|
| MD5 |
203a5477c32b60b479a509cd84199766
|
|
| BLAKE2b-256 |
1b9db4b8c0b2877be726a9a8df9ff7e46d8b20487e1b8fca8567c07f8e047b0b
|