TokLens: Looking Beyond Fertility in Tokenizer Evaluation

These details have not been verified by PyPI

Project links

Project description

TokLens: A Multilingual Lens on Tokenizer Quality for LLMs

Accepted to ACL 2026 SRW. 🎉

Open-source toolkit for evaluating tokenizer quality across languages using six intrinsic metrics. We evaluate 24 tokenizers from major LLM families across 15 typologically diverse languages and correlate with downstream performance.

Key Findings

Stark cross-lingual disparities persist. GPT-2 produces 56x more tokens per word in Japanese than English. Qwen2.5 and Gemma-2 reduce this gap to under 4x.
No metric predicts English benchmark performance after controlling for model size (Bonferroni-corrected). Tokenizer quality does not drive English leaderboard scores.
STRR significantly predicts multilingual performance. On MMLU-ProX, linear mixed-effects models show STRR has a large positive effect (β = +5.7, z = 18.5, p < 0.001).
Higher STRR correlates with steeper scaling. A controlled experiment on the Qwen2.5 family (fixed tokenizer, varying model size) shows languages with higher STRR scale more steeply (ρ = 0.91, p < 0.001).

Benchmark correlations

Per-language scaling slope vs. tokenizer metrics (Qwen2.5 family)

Scaling slope vs metrics

Metrics

Metric	Description
Fertility	Tokens per whitespace-delimited word. Lower = better compression.
CPT	Characters per token.
Compression ratio	Bytes per token.
NSL	Normalized sequence length relative to a reference tokenizer.
STRR	Single-token retention rate. Fraction of words encoded as a single token.
Parity	Cross-lingual fairness: ratio of token counts for parallel English sentences.

Models and Languages

22 models with Open LLM Leaderboard v2 scores, plus 2 extra tokenizers (Qwen3, DeepSeek-V3) for metric-only analysis.

15 languages across 6 scripts: English, Chinese, Japanese, Arabic, Hindi, German, Turkish, Korean, Thai, Russian, French, Spanish, Portuguese, Vietnamese, Indonesian.

Quickstart

pip install toklens

from toklens import Analyzer

analyzer = Analyzer.from_pretrained("meta-llama/Llama-3.1-8B")
report = analyzer.evaluate(langs=["en", "zh", "ja", "ar"])
report.print_table()

toklens eval meta-llama/Llama-3.1-8B --langs en zh ja ar
toklens compare meta-llama/Llama-2-7b-hf meta-llama/Llama-3.1-8B

Experiments

Reproduces the full evaluation. Run steps in order:

uv run python -m experiments.pipeline.01_collect_benchmarks
uv run python -m experiments.pipeline.02_compute_metrics
uv run python -m experiments.pipeline.03_correlation
uv run python -m experiments.pipeline.04_figures

Supplementary analyses (LME models, Qtok comparison, BPB, Qwen scaling) are in experiments/analyses/. See experiments/README.md for details.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Apr 30, 2026

This version

0.1.1

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toklens-0.1.1.tar.gz (29.8 MB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toklens-0.1.1-py3-none-any.whl (12.7 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file toklens-0.1.1.tar.gz.

File metadata

Download URL: toklens-0.1.1.tar.gz
Upload date: Apr 30, 2026
Size: 29.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.8

File hashes

Hashes for toklens-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e9d8cc964e9049b406715cc133ffd0f71b93626f6dcf671489e8cce0d035b227`
MD5	`62bd0b36943e9eee99c5047c6a1c43c4`
BLAKE2b-256	`cb04730afdde45f6e81c2f2f23171be7446002545bdd12ca226493ef143c09bb`

See more details on using hashes here.

File details

Details for the file toklens-0.1.1-py3-none-any.whl.

File metadata

Download URL: toklens-0.1.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.8

File hashes

Hashes for toklens-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7907d2025e2811459fe095d430fd911d0e04b131b6caf4189e6d4363bd5ff5d7`
MD5	`0aa3ba48de7a52da43aaa9beb341133a`
BLAKE2b-256	`47eecc414bf94f2451cc50048a70765b4fcadea27615e0bc47e416342d0e3202`

See more details on using hashes here.

toklens 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TokLens: A Multilingual Lens on Tokenizer Quality for LLMs

Key Findings

Benchmark correlations

Per-language scaling slope vs. tokenizer metrics (Qwen2.5 family)

Metrics

Models and Languages

Quickstart

Experiments

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes