Skip to main content

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

Project description

runtoken

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

Built from scratch in Rust with Python bindings via PyO3. Produces identical output to tiktoken — same token IDs, same order, every time.

License: MIT Python 3.8+


Why?

If you're building an LLM gateway, proxy, or any system that processes tokens at scale, tokenization speed matters. Every API request needs token counting for:

  • Cost estimation & billing
  • Rate limiting per user
  • Context window management
  • Smart routing (pick the cheapest model that fits)

tiktoken is good. runtoken is faster.

Benchmarks

Apples-to-apples comparison — both called as Python packages, same machine, same text:

Encode (full token IDs)

Input tiktoken runtoken Speedup
Short text (29 chars, 9 tokens) 1.3M tok/s 24.6M tok/s 19x
Medium text (1050 chars, 511 tokens) 2.5M tok/s 68.8M tok/s 27x
Code (1200 chars, 380 tokens) 1.5M tok/s 63.5M tok/s 44x
Long English (4500 chars, 1001 tokens) 2.5M tok/s 73.6M tok/s 29x
Long code (5600 chars, 2160 tokens) 1.5M tok/s 88.2M tok/s 59x
Unicode (500 chars, 420 tokens) 4.2M tok/s 89.2M tok/s 21x

Count-only (the gateway use case)

Input tiktoken runtoken Speedup
Medium text 2.5M tok/s 940M tok/s 381x
Long English 2.6M tok/s 1.4B tok/s 538x
Long code 1.5M tok/s 2.6B tok/s 1750x

Benchmarked on a 2-vCPU cloud instance. Count-only benefits from multi-level caching (text-level + chunk-level LRU).

Correctness

Test Suite Tests Result
Deep correctness (41 strings × 3 encodings) 123 ✅ 100%
Stress test (up to 64K tokens) 27 ✅ 100%
PDF documents (academic papers, 65K tokens) 54 ✅ 100%
Total 204 0 mismatches

Every test compares exact token IDs — not just counts, but the same numbers in the same order.

Installation

pip install runtoken

From source

git clone https://github.com/Thibault00/runtoken.git
cd runtoken
pip install maturin
maturin develop --release

Usage

Python

import runtoken

# Get a tokenizer by encoding name (same API as tiktoken)
enc = runtoken.get_encoding("cl100k_base")

# Encode text to token IDs
tokens = enc.encode("Hello, world!")
# [9906, 11, 1917, 0]

# Count tokens
count = enc.count("Hello, world!")
# 4

# Decode back to text
text = enc.decode([9906, 11, 1917, 0])
# "Hello, world!"

# Get tokenizer by model name
enc = runtoken.encoding_for_model("gpt-4o")  # → o200k_base
enc = runtoken.encoding_for_model("gpt-4")   # → cl100k_base
enc = runtoken.encoding_for_model("claude")   # → cl100k_base

# Quick one-liner
runtoken.count("Hello!", model="gpt-4o")
# 2

Rust

use runtoken::Tokenizer;

let tokenizer = Tokenizer::new("cl100k_base").unwrap();
let tokens = tokenizer.encode("Hello, world!");
let count = tokenizer.count("Hello, world!");
let text = tokenizer.decode(&tokens);

CLI

# Encode text
runtoken-cli encode "Hello, world!" cl100k_base

# Count tokens
runtoken-cli count "Hello, world!" o200k_base

# Read from stdin (for large texts)
cat myfile.txt | runtoken-cli count - cl100k_base

# Benchmark
runtoken-cli bench cl100k_base

Supported Encodings

Encoding Models Vocab Size
cl100k_base GPT-4, GPT-3.5-turbo, Claude 100,256
o200k_base GPT-4o, o1, o3 200,019
p50k_base text-davinci-003, Codex 50,281

Model → Encoding Mapping

Model prefix Encoding
gpt-4o, o1, o3 o200k_base
gpt-4, gpt-3.5, claude cl100k_base
text-davinci, code-davinci p50k_base

Architecture

src/
├── lib.rs       # Tokenizer + TokenizerRegistry + multi-level caching
├── bpe.rs       # Core BPE merge algorithm (tiktoken-compatible)
├── vocab.rs     # Vocabulary loading (.tiktoken format)
├── regex.rs     # Regex splitting per encoding
├── python.rs    # PyO3 bindings
└── main.rs      # CLI tool

~900 lines of Rust — that's the entire tokenizer. Key design decisions:

  • Multi-level LRU cache: Text-level (hash → tokens) + chunk-level (bytes → tokens). Repeated text is a hash lookup.
  • Precomputed rank tables: Single-byte and two-byte pair ranks as direct arrays — no HashMap overhead for the most common lookups.
  • Inline chunk processing: Regex chunks are encoded inline without collecting into intermediate Vecs.
  • tiktoken-style BPE merge: Tracks min_rank inline during merges, avoids priority queue overhead for small chunks.

How it works

BPE (Byte Pair Encoding) tokenization:

  1. Regex split: Split input text into chunks using encoding-specific regex patterns
  2. Byte-level merging: For each chunk, start with individual bytes and repeatedly merge the pair with the lowest rank (priority) in the vocabulary
  3. Token IDs: Map the final merged byte sequences to their vocabulary rank

runtoken uses the exact same regex patterns and vocabulary files as tiktoken, which is why the output is identical.

Contributing

# Clone and build
git clone https://github.com/Thibault00/runtoken.git
cd runtoken
cargo build --release

# Run Rust tests
cargo test

# Run correctness tests against tiktoken
pip install tiktoken
python tests/deep_correctness.py
python tests/stress_test.py

# Build Python package
pip install maturin
maturin develop --release
python tests/benchmark_python.py

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runtoken-0.1.2.tar.gz (3.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

runtoken-0.1.2-cp312-cp312-win_amd64.whl (3.8 MB view details)

Uploaded CPython 3.12Windows x86-64

runtoken-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

runtoken-0.1.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

runtoken-0.1.2-cp312-cp312-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

runtoken-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file runtoken-0.1.2.tar.gz.

File metadata

  • Download URL: runtoken-0.1.2.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for runtoken-0.1.2.tar.gz
Algorithm Hash digest
SHA256 eebf5555a1d614f031080cc5019ad5f8752a809c54755684c374c0e6b819c18b
MD5 fb898b93c7cdfb2fb2ab441502483cbf
BLAKE2b-256 96615ec425f0b3a4a41156a905254dfdffc89c4f390ac1d986390504879765d3

See more details on using hashes here.

File details

Details for the file runtoken-0.1.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: runtoken-0.1.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for runtoken-0.1.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f9c94a3bb9700b903b9110da5ae2e9d2c799d05e6fa23a569b39d6d1850b5954
MD5 89b13c193eb86e417bc5f75309b79c66
BLAKE2b-256 8f4105dd0059eb6c2922d7d001939cb661255201fe5d514fe55a74d8a4a3f971

See more details on using hashes here.

File details

Details for the file runtoken-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for runtoken-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 31206cff14472391dae5e6a66a66ee7b6a615a6e546101b983d2694279949478
MD5 a48fc34cda9e292c61431d1a485db6a5
BLAKE2b-256 dc00075052a7df61c72abd7709164782872252da938bb9cdf80fda05adadd49f

See more details on using hashes here.

File details

Details for the file runtoken-0.1.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for runtoken-0.1.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 29c12f9b5c2b95584efdf2e1ac93b48a75fe8be19e1057ddb2e6b57464f7d2e8
MD5 2d3edfff9dd31ffeb1173b615bb57ee1
BLAKE2b-256 5af66eb775d079b250b77aaff6f8fe2f6748f2bd31351cb08cbe78854da513f2

See more details on using hashes here.

File details

Details for the file runtoken-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for runtoken-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fa3f561640bcd30355b0295be0071fa2908ca1cd167326050b608f479ffaed76
MD5 831b5f01c79aaf1d944b2743a409cd42
BLAKE2b-256 6285ebd4c0a4b77329b2b07f92f1ab420cf741e47d09e838fef74d8615f2f168

See more details on using hashes here.

File details

Details for the file runtoken-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for runtoken-0.1.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5bde25681df9e7371369f1c774ea35c4f3b1c0bf222a0f99105102c54d1d10ba
MD5 203a5477c32b60b479a509cd84199766
BLAKE2b-256 1b9db4b8c0b2877be726a9a8df9ff7e46d8b20487e1b8fca8567c07f8e047b0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page