Skip to main content

High-performance Python tokenizer backed by IREE

Project description

iree-tokenizer

Python bindings for the IREE tokenizer — a high-performance C tokenizer with full HuggingFace tokenizer.json compatibility.

  • Fast. 3–12x faster than tiktoken, 10–20x faster than HF tokenizers. Pure C hot path with zero allocations per token.
  • Zero Python dependencies beyond numpy.
  • Small. ~317KiB (compared to 1-3MiB for alternatives).
  • Streaming encode/decode. First-class support for incremental tokenization — feed chunks in, get tokens out. Ideal for LLM inference.
  • Drop-in compatible. Loads any HuggingFace tokenizer.json. Supports BPE, WordPiece, and Unigram models.

Based on the IREE high-speed tokenizer library:

  • Optimized for cache utilization. Efficiently utilizes cache on both large and small CPUs. No dependencies and small footprint make it ideal for embedded/client and inclusion into other projects.
  • Unique Algorithmic optimizations. Pull-based streaming processor with bounded/small, deterministic memory usage. Various novel optimizations not seen elsewhere.
  • GPU-ready. Designed to be compatible with executing tiled on the GPU, not just the host.

Performance

GPT-2 tokenizer, single-threaded, p50 latency over 50 iterations.

Encode (22K chars → 5000 tokens)
  iree       469 µs    10.6M tok/s
  tiktoken  1251 µs     4.0M tok/s   2.7x slower
  hf        5420 µs     0.9M tok/s  11.6x slower

Decode (5000 tokens → text)
  iree        72 µs
  tiktoken    78 µs                   1.1x slower
  hf         599 µs                   8.3x slower

Batch Encode (100 × 880 chars)
  iree      1942 µs    10.3M tok/s
  tiktoken  5148 µs     3.8M tok/s   2.7x slower
  hf       22022 µs     0.9M tok/s  11.3x slower

Measured on AMD Threadripper 3970X, 128 GB DDR4, Fedora 43, GCC 15.2, Python 3.14.

Quick Start

from iree.tokenizer import Tokenizer

tok = Tokenizer.from_file("tokenizer.json")

# Encode / decode
ids = tok.encode("Hello world")          # [15496, 995]
text = tok.decode(ids)                    # "Hello world"

# Batch
tok.encode_batch(["Hello", "world"])      # [[15496], [995]]

# Numpy (zero-copy)
arr = tok.encode_to_array("Hello world")  # int32 ndarray

# Rich encoding with byte offsets
enc = tok.encode_rich("Hello world", track_offsets=True)
# enc.ids, enc.offsets, enc.type_ids

# Streaming decode (LLM token-at-a-time pattern)
from iree.tokenizer import decode_stream_iter
for chunk in decode_stream_iter(tok, token_generator):
    print(chunk, end="", flush=True)

API

Method Returns Description
Tokenizer.from_file(path) Tokenizer Load from tokenizer.json
Tokenizer.from_str(json) Tokenizer Load from JSON string
Tokenizer.from_buffer(bytes) Tokenizer Load from bytes
tok.encode(text) list[int] Encode text to token IDs
tok.encode_to_array(text) np.ndarray Encode to numpy int32 array
tok.encode_rich(text) Encoding IDs + byte offsets + type IDs
tok.decode(ids) str Decode token IDs to text
tok.encode_batch(texts) list[list[int]] Batch encode
tok.decode_batch(id_lists) list[str] Batch decode
tok.encode_stream() EncodeStream Streaming encoder (context manager)
tok.decode_stream() DecodeStream Streaming decoder (context manager)
tok.vocab_size int Vocabulary size
tok.model_type str "BPE", "WordPiece", or "Unigram"
tok.token_to_id(token) int | None Look up token ID
tok.id_to_token(id) str | None Look up token text

CLI

A streaming iree-tokenize command is included. It reads from stdin, writes JSONL to stdout, and shows live throughput on stderr.

# Encode text to token IDs
echo "Hello world" | iree-tokenize encode -t tokenizer.json
# {"seq":0,"text":"Hello world","ids":[15496,995],"n_tokens":2,...}

# Decode token IDs back to text
echo '[15496, 995]' | iree-tokenize decode -t tokenizer.json
# {"seq":0,"ids":[15496,995],"text":"Hello world","n_tokens":2,...}

# Chain encode → decode (round-trip)
cat corpus.txt | iree-tokenize encode -t tokenizer.json | iree-tokenize decode -t tokenizer.json

# Tokenizer info
iree-tokenize info -t tokenizer.json

Output is chainable: encode output feeds directly into decode and vice versa. Use --compact to omit timing fields, --rich for byte offsets, or --no-progress to suppress the stderr throughput display.

Note that this tool illustrates streaming processing but the overhead of JSON processing is expensive and skews throughput. Treat this as an example of how to operate the streaming API vs a benchmarking tool or a tool expected to achieve maximum throughput.

License

Apache 2.0 with LLVM Exceptions — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl (273.9 kB view details)

Uploaded CPython 3.12+Windows x86-64

iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (329.3 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (333.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (333.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6d6741ac857db27df6691dc36c4b20f0c3b0ba274cb4cba0449409423dc51316
MD5 8b89ce11a3b338157fb7772f75f8c705
BLAKE2b-256 7e1095169f54606516dedd6f3456bd674ad1a7d0f09a16d128d8af0917458703

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3cd3a6bb27cc24237580cad6d50f63cf094f3e6fac3bdefc0119db2f9f80ce3d
MD5 73c2b00227658bf5f69eb08da28da571
BLAKE2b-256 12137c82468588cdd5083d255412d2d04a9ce9cb1f2bdb58246e92a43563f117

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0d39d83a96e8362480db68af306d6a06ed7b9bf22dd5acaabd66eea7c6f42494
MD5 976c804dc68bd420500a56f0eccbdc21
BLAKE2b-256 9ee718c5413cb19a3d7435d9960051dbcebdd7828a162adb6ee6cc40a86a00f9

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ded6ddd73e43245ea2f9f8009bdd19dd00eeae915c00729c52bf5fabd4a26676
MD5 ef706d8c084d0d0fa293a90eb8472c72
BLAKE2b-256 ec87a0cb9b14a10a1936c88ab1b888547c90e6ad5fffb669ddfb2ae18749d77d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page