High-performance Python tokenizer backed by IREE

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
Programming Language
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

iree-tokenizer

Python bindings for the IREE tokenizer — a high-performance C tokenizer with full HuggingFace tokenizer.json compatibility.

Fast. 3–12x faster than tiktoken, 10–20x faster than HF tokenizers. Pure C hot path with zero allocations per token.
Zero Python dependencies beyond numpy.
Small. ~317KiB (compared to 1-3MiB for alternatives).
Streaming encode/decode. First-class support for incremental tokenization — feed chunks in, get tokens out. Ideal for LLM inference.
Drop-in compatible. Loads any HuggingFace tokenizer.json. Supports BPE, WordPiece, and Unigram models.

Based on the IREE high-speed tokenizer library:

Optimized for cache utilization. Efficiently utilizes cache on both large and small CPUs. No dependencies and small footprint make it ideal for embedded/client and inclusion into other projects.
Unique Algorithmic optimizations. Pull-based streaming processor with bounded/small, deterministic memory usage. Various novel optimizations not seen elsewhere.
GPU-ready. Designed to be compatible with executing tiled on the GPU, not just the host.

Performance

GPT-2 tokenizer, single-threaded, p50 latency over 50 iterations.

Encode (22K chars → 5000 tokens)
  iree       469 µs    10.6M tok/s
  tiktoken  1251 µs     4.0M tok/s   2.7x slower
  hf        5420 µs     0.9M tok/s  11.6x slower

Decode (5000 tokens → text)
  iree        72 µs
  tiktoken    78 µs                   1.1x slower
  hf         599 µs                   8.3x slower

Batch Encode (100 × 880 chars)
  iree      1942 µs    10.3M tok/s
  tiktoken  5148 µs     3.8M tok/s   2.7x slower
  hf       22022 µs     0.9M tok/s  11.3x slower

Measured on AMD Threadripper 3970X, 128 GB DDR4, Fedora 43, GCC 15.2, Python 3.14.

Quick Start

from iree.tokenizer import Tokenizer

tok = Tokenizer.from_file("tokenizer.json")

# Encode / decode
ids = tok.encode("Hello world")          # [15496, 995]
text = tok.decode(ids)                    # "Hello world"

# Batch
tok.encode_batch(["Hello", "world"])      # [[15496], [995]]

# Numpy (zero-copy)
arr = tok.encode_to_array("Hello world")  # int32 ndarray

# Rich encoding with byte offsets
enc = tok.encode_rich("Hello world", track_offsets=True)
# enc.ids, enc.offsets, enc.type_ids

# Streaming decode (LLM token-at-a-time pattern)
from iree.tokenizer import decode_stream_iter
for chunk in decode_stream_iter(tok, token_generator):
    print(chunk, end="", flush=True)

API

Method	Returns	Description
`Tokenizer.from_file(path)`	`Tokenizer`	Load from `tokenizer.json`
`Tokenizer.from_str(json)`	`Tokenizer`	Load from JSON string
`Tokenizer.from_buffer(bytes)`	`Tokenizer`	Load from bytes
`tok.encode(text)`	`list[int]`	Encode text to token IDs
`tok.encode_to_array(text)`	`np.ndarray`	Encode to numpy int32 array
`tok.encode_rich(text)`	`Encoding`	IDs + byte offsets + type IDs
`tok.decode(ids)`	`str`	Decode token IDs to text
`tok.encode_batch(texts)`	`list[list[int]]`	Batch encode
`tok.decode_batch(id_lists)`	`list[str]`	Batch decode
`tok.encode_stream()`	`EncodeStream`	Streaming encoder (context manager)
`tok.decode_stream()`	`DecodeStream`	Streaming decoder (context manager)
`tok.vocab_size`	`int`	Vocabulary size
`tok.model_type`	`str`	`"BPE"`, `"WordPiece"`, or `"Unigram"`
`tok.token_to_id(token)`	`int \| None`	Look up token ID
`tok.id_to_token(id)`	`str \| None`	Look up token text

CLI

A streaming iree-tokenize command is included. It reads from stdin, writes JSONL to stdout, and shows live throughput on stderr.

# Encode text to token IDs
echo "Hello world" | iree-tokenize encode -t tokenizer.json
# {"seq":0,"text":"Hello world","ids":[15496,995],"n_tokens":2,...}

# Decode token IDs back to text
echo '[15496, 995]' | iree-tokenize decode -t tokenizer.json
# {"seq":0,"ids":[15496,995],"text":"Hello world","n_tokens":2,...}

# Chain encode → decode (round-trip)
cat corpus.txt | iree-tokenize encode -t tokenizer.json | iree-tokenize decode -t tokenizer.json

# Tokenizer info
iree-tokenize info -t tokenizer.json

Output is chainable: encode output feeds directly into decode and vice versa. Use --compact to omit timing fields, --rich for byte offsets, or --no-progress to suppress the stderr throughput display.

Note that this tool illustrates streaming processing but the overhead of JSON processing is expensive and skews throughput. Treat this as an example of how to operate the streaming API vs a benchmarking tool or a tool expected to achieve maximum throughput.

License

Apache 2.0 with LLVM Exceptions — see LICENSE.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
Programming Language
- C++
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.3.0

Mar 5, 2026

This version

0.2.0

Mar 5, 2026

0.1.0.dev0 pre-release yanked

Mar 4, 2026

Reason this release was yanked:

Pre-release to hold name

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl (273.9 kB view details)

Uploaded Mar 5, 2026 CPython 3.12+Windows x86-64

iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (329.3 kB view details)

Uploaded Mar 5, 2026 CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (333.6 kB view details)

Uploaded Mar 5, 2026 CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (333.9 kB view details)

Uploaded Mar 5, 2026 CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl.

File metadata

Download URL: iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl
Upload date: Mar 5, 2026
Size: 273.9 kB
Tags: CPython 3.12+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for iree_tokenizer-0.2.0-cp312-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`6d6741ac857db27df6691dc36c4b20f0c3b0ba274cb4cba0449409423dc51316`
MD5	`8b89ce11a3b338157fb7772f75f8c705`
BLAKE2b-256	`7e1095169f54606516dedd6f3456bd674ad1a7d0f09a16d128d8af0917458703`

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: Mar 5, 2026
Size: 329.3 kB
Tags: CPython 3.12+, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for iree_tokenizer-0.2.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`3cd3a6bb27cc24237580cad6d50f63cf094f3e6fac3bdefc0119db2f9f80ce3d`
MD5	`73c2b00227658bf5f69eb08da28da571`
BLAKE2b-256	`12137c82468588cdd5083d255412d2d04a9ce9cb1f2bdb58246e92a43563f117`

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: Mar 5, 2026
Size: 333.6 kB
Tags: CPython 3.11, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for iree_tokenizer-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`0d39d83a96e8362480db68af306d6a06ed7b9bf22dd5acaabd66eea7c6f42494`
MD5	`976c804dc68bd420500a56f0eccbdc21`
BLAKE2b-256	`9ee718c5413cb19a3d7435d9960051dbcebdd7828a162adb6ee6cc40a86a00f9`

See more details on using hashes here.

File details

Details for the file iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Upload date: Mar 5, 2026
Size: 333.9 kB
Tags: CPython 3.10, manylinux: glibc 2.27+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for iree_tokenizer-0.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`ded6ddd73e43245ea2f9f8009bdd19dd00eeae915c00729c52bf5fabd4a26676`
MD5	`ef706d8c084d0d0fa293a90eb8472c72`
BLAKE2b-256	`ec87a0cb9b14a10a1936c88ab1b888547c90e6ad5fffb669ddfb2ae18749d77d`

See more details on using hashes here.

iree-tokenizer 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

iree-tokenizer

Performance

Quick Start

API

CLI

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes