No project description provided

These details have been verified by PyPI

Maintainers

ArthurZucker danieldk McPotato Nicolas.Patry xn1t0x

These details have not been verified by PyPI

Project links

Project description

Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
Easy to use, but also extremely versatile.
Designed for research and production.
Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install -e .

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

CharBPETokenizer: The original BPE
ByteLevelBPETokenizer: The byte level version of the BPE
SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Typing support and stub generation

The compiled PyO3 extension does not expose type annotations, so editors and type checkers would otherwise see most objects as Any. To provide full typing support, we use a two-step stub generation process:

Rust introspection (tools/stub-gen/): Uses pyo3-introspection to analyze the compiled extension and generate .pyi stub files
Python enrichment (stub.py): Adds docstrings from the runtime module and generates forwarding __init__.py shims

Running stub generation

The easiest way to regenerate stubs is via make style:

cd bindings/python
make style

This will:

Build the extension with maturin develop --release
Run introspection to generate .pyi files
Enrich stubs with docstrings via stub.py
Format with ruff

Running manually

To run the stub generator directly:

cd bindings/python
cargo run --manifest-path tools/stub-gen/Cargo.toml
python stub.py

The stub generator automatically:

Builds the extension using maturin
Copies the built .so to the project root for introspection
Detects and sets PYTHONHOME for embedded Python (handles uv/venv environments)
Generates stubs to py_src/tokenizers/

Troubleshooting

If you encounter Python initialization errors, you can manually set PYTHONHOME:

export PYTHONHOME=$(python3 -c 'import sys; print(sys.base_prefix)')
cargo run --manifest-path tools/stub-gen/Cargo.toml

Project details

These details have been verified by PyPI

Maintainers

ArthurZucker danieldk McPotato Nicolas.Patry xn1t0x

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.23.0rc0 pre-release

Apr 24, 2026

0.22.2

Jan 5, 2026

0.22.2rc0 pre-release

Dec 2, 2025

0.22.1

Sep 19, 2025

0.22.1rc0 pre-release

Sep 19, 2025

0.22.0

Aug 29, 2025

0.22.0rc0 pre-release

Aug 29, 2025

0.21.4

Jul 28, 2025

0.21.2

Jun 24, 2025

0.21.2rc0 pre-release

Jun 24, 2025

0.21.1

Mar 13, 2025

0.21.1rc0 pre-release

Mar 12, 2025

0.21.0

Nov 27, 2024

0.21.0rc0 pre-release

Nov 27, 2024

0.20.4 yanked

Nov 26, 2024

Reason this release was yanked:

Removing support for python3.7 and 3.8 is breaking and is move to v0.21.0

0.20.4rc0 pre-release

Nov 26, 2024

0.20.3

Nov 5, 2024

0.20.2

Nov 4, 2024

0.20.1

Oct 10, 2024

0.20.0

Aug 8, 2024

0.19.1

Apr 17, 2024

0.19.0

Apr 17, 2024

0.15.2

Feb 12, 2024

0.15.1

Jan 22, 2024

0.15.0

Nov 14, 2023

0.14.1

Oct 6, 2023

0.14.0

Sep 7, 2023

0.13.3

Apr 5, 2023

0.13.2

Nov 7, 2022

0.13.1

Oct 6, 2022

0.13.0

Sep 21, 2022

0.12.1

Apr 13, 2022

0.12.0 yanked

Mar 31, 2022

Reason this release was yanked:

Breaking change with unexpected effect

0.11.6

Feb 28, 2022

0.11.5

Feb 16, 2022

0.11.4

Jan 17, 2022

0.11.3

Jan 17, 2022

0.11.2

Jan 4, 2022

0.11.1

Dec 28, 2021

0.11.0

Dec 23, 2021

0.10.3

May 24, 2021

0.10.2

Apr 5, 2021

0.10.1

Feb 8, 2021

0.10.0

Jan 12, 2021

0.10.0rc1 pre-release

Dec 8, 2020

0.9.4

Nov 10, 2020

0.9.3

Oct 26, 2020

0.9.2

Oct 15, 2020

0.9.1

Oct 13, 2020

0.9.0

Oct 9, 2020

0.9.0rc2 pre-release

Oct 6, 2020

0.9.0rc1 pre-release

Sep 29, 2020

0.9.0.dev4 pre-release

Sep 24, 2020

0.9.0.dev3 pre-release

Sep 18, 2020

0.9.0.dev2 pre-release

Sep 14, 2020

0.9.0.dev1 pre-release

Sep 14, 2020

0.9.0.dev0 pre-release

Aug 21, 2020

0.8.1

Jul 20, 2020

0.8.1rc2 pre-release

Jul 17, 2020

0.8.1rc1 pre-release

Jul 6, 2020

0.8.0

Jun 26, 2020

0.8.0rc4 pre-release

Jun 26, 2020

0.8.0rc3 pre-release

Jun 22, 2020

0.8.0rc2 pre-release

Jun 19, 2020

0.8.0rc1 pre-release

Jun 11, 2020

0.8.0.dev2 pre-release

Jun 3, 2020

0.8.0.dev1 pre-release

May 27, 2020

0.8.0.dev0 pre-release

May 21, 2020

0.7.0

Apr 17, 2020

0.7.0rc7 pre-release

Apr 16, 2020

0.7.0rc6 pre-release

Apr 16, 2020

0.7.0rc5 pre-release

Apr 9, 2020

0.7.0rc4 pre-release

Apr 8, 2020

0.6.0

Mar 2, 2020

0.5.2

Feb 24, 2020

0.5.1

Feb 24, 2020

0.5.0

Feb 19, 2020

0.4.2

Feb 11, 2020

0.4.1

Feb 11, 2020

0.4.0

Feb 10, 2020

0.3.0

Feb 5, 2020

0.2.1

Jan 22, 2020

0.2.0

Jan 20, 2020

0.1.1

Jan 12, 2020

0.1.0

Jan 10, 2020

0.0.13

Jan 8, 2020

0.0.12

Jan 7, 2020

0.0.11

Dec 27, 2019

0.0.10

Dec 26, 2019

0.0.9

Dec 23, 2019

0.0.8

Dec 20, 2019

0.0.7

Dec 17, 2019

0.0.6

Dec 17, 2019

0.0.5

Dec 13, 2019

0.0.4

Dec 10, 2019

0.0.3

Dec 3, 2019

0.0.2

Dec 2, 2019

0.0.1

Nov 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.23.0rc0.tar.gz (361.6 kB view details)

Uploaded Apr 24, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenizers-0.23.0rc0-cp310-abi3-win_arm64.whl (2.7 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+Windows ARM64

tokenizers-0.23.0rc0-cp310-abi3-win_amd64.whl (2.8 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+Windows x86-64

tokenizers-0.23.0rc0-cp310-abi3-win32.whl (2.5 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+Windows x86

tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_x86_64.whl (10.1 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+musllinux: musl 1.2+ x86-64

tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_i686.whl (10.0 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+musllinux: musl 1.2+ i686

tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_armv7l.whl (9.6 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+musllinux: musl 1.2+ ARMv7l

tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_aarch64.whl (9.8 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+musllinux: musl 1.2+ ARM64

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_31_riscv64.whl (3.4 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.31+ riscv64

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ x86-64

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (3.4 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ s390x

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (3.8 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ ppc64le

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (3.6 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ i686

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (3.2 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ ARMv7l

tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.4 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+manylinux: glibc 2.17+ ARM64

tokenizers-0.23.0rc0-cp310-abi3-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+macOS 11.0+ ARM64

tokenizers-0.23.0rc0-cp310-abi3-macosx_10_12_x86_64.whl (3.1 MB view details)

Uploaded Apr 24, 2026 CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file tokenizers-0.23.0rc0.tar.gz.

File metadata

Download URL: tokenizers-0.23.0rc0.tar.gz
Upload date: Apr 24, 2026
Size: 361.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0.tar.gz
Algorithm	Hash digest
SHA256	`685c6d269444451a2cf276d3f2bf655f3d7094be20c6553e413ede86b03c637b`
MD5	`051985c99d48167390c6b52eb575e769`
BLAKE2b-256	`0bdc2ba78324f6c82284f8d3d03bba16e5771d075aa4d5e9b4ecbd87af846af2`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-win_arm64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-win_arm64.whl
Upload date: Apr 24, 2026
Size: 2.7 MB
Tags: CPython 3.10+, Windows ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-win_arm64.whl
Algorithm	Hash digest
SHA256	`0b66c5eab2ddd26e59cfe6aa1945aa8b656ea0a9a715e24171c01b5ab1987630`
MD5	`75659ea3391c145bc1a5b2bed272c663`
BLAKE2b-256	`35ec920d2b36ddddb5ce819a005d9650dc941935e534a27c48758c93388aaa5b`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-win_amd64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-win_amd64.whl
Upload date: Apr 24, 2026
Size: 2.8 MB
Tags: CPython 3.10+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`27fe690eeb35a3a7e52f47d96c2ce8ffc6f939cc51a4591be86d2c86b9881267`
MD5	`cb3b3d817e8b296cd586267392961fe2`
BLAKE2b-256	`d89b34b36f6a47fec0a160887da23f173aa8a1729fa425ee67944c9be27f58de`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-win32.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-win32.whl
Upload date: Apr 24, 2026
Size: 2.5 MB
Tags: CPython 3.10+, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-win32.whl
Algorithm	Hash digest
SHA256	`ab264a8ffdea05b5fd71a8bca6572762bde9b7aaadeba16dd25c7352a625fa71`
MD5	`4736cfe9da7387e4704cf50d680c41f2`
BLAKE2b-256	`78c4d9d587b9b32c9fca5ea901225d5c4c616802eb0082b17481d23808941641`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_x86_64.whl
Upload date: Apr 24, 2026
Size: 10.1 MB
Tags: CPython 3.10+, musllinux: musl 1.2+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm	Hash digest
SHA256	`33ed7df57a040ffb6f0244639619632a06f4c287ed1e77b5e70febb58f9e9a8b`
MD5	`49bce0f49facdc6a5e9e7633d3b10ad9`
BLAKE2b-256	`1195d1a6a0e6d6a9bc81b8124d83beb1fb1230310ee93938095f984a12fa336d`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_i686.whl
Upload date: Apr 24, 2026
Size: 10.0 MB
Tags: CPython 3.10+, musllinux: musl 1.2+ i686
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_i686.whl
Algorithm	Hash digest
SHA256	`5835b35d9a4815c8a4097d4dbac79c39b780684ea417fa4a93b9165e12ff1383`
MD5	`e561832529f7ae962c21cfa862dbd482`
BLAKE2b-256	`f9f6c15a5514f50bf953b70d3d2b7fd1829aa327ba8c9c519c54623510d6f459`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_armv7l.whl
Upload date: Apr 24, 2026
Size: 9.6 MB
Tags: CPython 3.10+, musllinux: musl 1.2+ ARMv7l
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm	Hash digest
SHA256	`e61dff90a4ad8dc7e7e124d67756d63cf3ae57e32f04fb35bb408af91f47ea70`
MD5	`880c11d013e6a0b91671264e94ad9667`
BLAKE2b-256	`458a70c9919aefc7f514d6e98fb9be379b2850ca071a841d88900278781a07b0`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_aarch64.whl
Upload date: Apr 24, 2026
Size: 9.8 MB
Tags: CPython 3.10+, musllinux: musl 1.2+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm	Hash digest
SHA256	`f82b7578eaad0cbb72765d1fbaa7e7bc04c531337513a21f437b73e4617fcf46`
MD5	`0bb2e9d9606c58a5c3a694af9fbceed2`
BLAKE2b-256	`46483a4bd2ba88af778e6fa6d03e271b2bc868f495745c8be91616781bf460d9`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_31_riscv64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_31_riscv64.whl
Upload date: Apr 24, 2026
Size: 3.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.31+ riscv64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_31_riscv64.whl
Algorithm	Hash digest
SHA256	`85f29751c4490bfaefe7e0d4b18ef28cd6d5f84c411e88ca896832eb4f18dd69`
MD5	`cecc964ba4d480b99ff2f5b91216f9a6`
BLAKE2b-256	`50e4939249edee0073417b2c9447fd3b06e90c283ef6df72f3124427edae1f96`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Apr 24, 2026
Size: 3.3 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`82167864c62a3d83880ed23dea267aa5760e3fcf16fd73f94d413baf1968b211`
MD5	`74f40491d7529b892caa77d40a29aa26`
BLAKE2b-256	`ff310e4b77ca48b302a5db827584c9784f6cdbb35380c0dd1d7668712d477bb5`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Apr 24, 2026
Size: 3.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`564115d3d6d2560b0a6b833d7dc39330d2328262557fbbd5bb0a14fb09b2b6cb`
MD5	`45e2d6a44dd9ca1e42c44781feaa61fe`
BLAKE2b-256	`a4cb161e52a424aa7ffb4097e8ce343d8dc2bdc42d590601032d4a9e6e5f7da5`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Apr 24, 2026
Size: 3.8 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`1d6add82746146a6e052295ac429949c2d8e723244aa97ffe30cfee6cd788e98`
MD5	`4009ccdba33621968cd6f569e2c95245`
BLAKE2b-256	`9cf11a3b6a30388fe7d4b57b1ea7fcd6192341e479d65e50366ee0ba13d96d14`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Upload date: Apr 24, 2026
Size: 3.6 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ i686
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm	Hash digest
SHA256	`c52f927516521a3e1f6b6347f8bacedaf589eadd682e7ac87dac911d832c3a73`
MD5	`8ea1632e6992d46399588404ae3f3769`
BLAKE2b-256	`e132a46ab1348d0b573dab69860eee601927b9934323e40f6f6018bb362a6013`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Upload date: Apr 24, 2026
Size: 3.2 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ ARMv7l
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm	Hash digest
SHA256	`bcd2a49117ad88999bc5d18d05addf67ec28e69f53e609ab07733c1f96404583`
MD5	`7f4330fbcdaa67dbdc16d0efad428e5e`
BLAKE2b-256	`1479c8a7bdfee971346119349dab62f9918de512a7e5a8177555eaa50d854e1f`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Apr 24, 2026
Size: 3.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`704ffd50130f6c85aa76ad16c8218ff0f966b14c6e6cab7d0636e492e487ffa5`
MD5	`3eff5b481f731e60ebd49d4428f8a239`
BLAKE2b-256	`861154c1040ee93c8d74a364fbf4e17fd5d88e2eea940cbdba69d48d42a5a0c0`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-macosx_11_0_arm64.whl
Upload date: Apr 24, 2026
Size: 3.0 MB
Tags: CPython 3.10+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`951be943c0657d8fd12e104731165a56d995c87533cd7f70a9444ddd7afa7708`
MD5	`d0c6597843ff3688b569db09d5c7d9f1`
BLAKE2b-256	`fa1654bd9f9e5c3641fe3d6d0e5b1cee37c58cb7520d22752c2065fc5a83caff`

See more details on using hashes here.

File details

Details for the file tokenizers-0.23.0rc0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: tokenizers-0.23.0rc0-cp310-abi3-macosx_10_12_x86_64.whl
Upload date: Apr 24, 2026
Size: 3.1 MB
Tags: CPython 3.10+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.13.1

File hashes

Hashes for tokenizers-0.23.0rc0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`bed69208ba6f74057e18e3c8ed73d62e681ff44f7be642ddeff747247c8a7a98`
MD5	`5a73de82ce484ccb2eef155cb5ff7f46`
BLAKE2b-256	`7fb9dda4065e0f4b62e0e5a625cbaeb928a611d847171e059066b3adfdb3866f`

See more details on using hashes here.

tokenizers 0.23.0rc0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokenizers

Main features:

Installation

With pip:

From sources:

Load a pretrained tokenizer from the Hub

Using the provided Tokenizers

Provided Tokenizers

Build your own

Building a byte-level BPE

Typing support and stub generation

Running stub generation

Running manually

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details