No project description provided

These details have not been verified by PyPI

Project links

Project description

Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
Easy to use, but also extremely versatile.
Designed for research and production.
Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install -e .

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

CharBPETokenizer: The original BPE
ByteLevelBPETokenizer: The byte level version of the BPE
SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.21.2

Jun 24, 2025

0.21.2rc0 pre-release

Jun 24, 2025

0.21.1

Mar 13, 2025

0.21.1rc0 pre-release

Mar 12, 2025

0.21.0

Nov 27, 2024

0.21.0rc0 pre-release

Nov 27, 2024

0.20.4 yanked

Nov 26, 2024

Reason this release was yanked:

Removing support for python3.7 and 3.8 is breaking and is move to v0.21.0

0.20.4rc0 pre-release

Nov 26, 2024

0.20.3

Nov 5, 2024

0.20.3rc0 pre-release

Nov 15, 2024

0.20.2

Nov 4, 2024

0.20.1

Oct 10, 2024

0.20.1rc1 pre-release

Oct 10, 2024

0.20.0

Aug 8, 2024

0.20.0rc1 pre-release

Aug 8, 2024

0.19.1

Apr 17, 2024

0.19.0

Apr 17, 2024

0.15.2

Feb 12, 2024

0.15.1

Jan 22, 2024

0.15.0

Nov 14, 2023

0.14.1

Oct 6, 2023

0.14.0

Sep 7, 2023

0.13.3

Apr 5, 2023

0.13.2

Nov 7, 2022

0.13.1

Oct 6, 2022

0.13.0

Sep 21, 2022

0.12.1

Apr 13, 2022

0.12.0 yanked

Mar 31, 2022

Reason this release was yanked:

Breaking change with unexpected effect

0.11.6

Feb 28, 2022

0.11.5

Feb 16, 2022

0.11.4

Jan 17, 2022

0.11.3

Jan 17, 2022

0.11.2

Jan 4, 2022

0.11.1

Dec 28, 2021

0.11.0

Dec 23, 2021

0.10.3

May 24, 2021

0.10.2

Apr 5, 2021

0.10.1

Feb 8, 2021

0.10.1rc1 pre-release

Feb 4, 2021

0.10.0

Jan 12, 2021

0.10.0rc1 pre-release

Dec 8, 2020

0.9.4

Nov 10, 2020

0.9.3

Oct 26, 2020

0.9.2

Oct 15, 2020

0.9.1

Oct 13, 2020

0.9.0

Oct 9, 2020

0.9.0rc2 pre-release

Oct 6, 2020

0.9.0rc1 pre-release

Sep 29, 2020

0.9.0.dev4 pre-release

Sep 24, 2020

0.9.0.dev3 pre-release

Sep 18, 2020

0.9.0.dev2 pre-release

Sep 14, 2020

0.9.0.dev1 pre-release

Sep 14, 2020

0.9.0.dev0 pre-release

Aug 21, 2020

0.8.1

Jul 20, 2020

0.8.1rc2 pre-release

Jul 17, 2020

0.8.1rc1 pre-release

Jul 6, 2020

0.8.0

Jun 26, 2020

0.8.0rc4 pre-release

Jun 26, 2020

0.8.0rc3 pre-release

Jun 22, 2020

0.8.0rc2 pre-release

Jun 19, 2020

0.8.0rc1 pre-release

Jun 11, 2020

0.8.0.dev2 pre-release

Jun 3, 2020

0.8.0.dev1 pre-release

May 27, 2020

0.8.0.dev0 pre-release

May 21, 2020

0.7.0

Apr 17, 2020

0.7.0rc7 pre-release

Apr 16, 2020

0.7.0rc6 pre-release

Apr 16, 2020

0.7.0rc5 pre-release

Apr 9, 2020

0.7.0rc4 pre-release

Apr 8, 2020

0.7.0rc3 pre-release

Mar 31, 2020

0.7.0rc2 pre-release

Mar 27, 2020

0.7.0rc1 pre-release

Mar 26, 2020

0.6.0

Mar 2, 2020

0.5.2

Feb 24, 2020

0.5.1

Feb 24, 2020

0.5.0

Feb 19, 2020

0.4.2

Feb 11, 2020

0.4.1

Feb 11, 2020

0.4.0

Feb 10, 2020

0.3.0

Feb 5, 2020

0.2.1

Jan 22, 2020

0.2.0

Jan 20, 2020

0.1.1

Jan 12, 2020

0.1.0

Jan 10, 2020

0.0.13

Jan 8, 2020

0.0.12

Jan 7, 2020

0.0.11

Dec 27, 2019

0.0.10

Dec 26, 2019

0.0.9

Dec 23, 2019

0.0.8

Dec 20, 2019

0.0.7

Dec 17, 2019

0.0.6

Dec 17, 2019

0.0.5

Dec 13, 2019

0.0.4

Dec 10, 2019

0.0.3

Dec 3, 2019

0.0.2

Dec 2, 2019

0.0.1

Nov 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.21.2.tar.gz (351.5 kB view details)

Uploaded Jun 24, 2025 Source

Built Distributions

tokenizers-0.21.2-cp39-abi3-win_amd64.whl (2.5 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+Windows x86-64

tokenizers-0.21.2-cp39-abi3-win32.whl (2.3 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+Windows x86

tokenizers-0.21.2-cp39-abi3-musllinux_1_2_x86_64.whl (9.5 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+musllinux: musl 1.2+ x86-64

tokenizers-0.21.2-cp39-abi3-musllinux_1_2_i686.whl (9.3 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+musllinux: musl 1.2+ i686

tokenizers-0.21.2-cp39-abi3-musllinux_1_2_armv7l.whl (9.1 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+musllinux: musl 1.2+ ARMv7l

tokenizers-0.21.2-cp39-abi3-musllinux_1_2_aarch64.whl (9.1 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+musllinux: musl 1.2+ ARM64

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ x86-64

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (3.2 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ s390x

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (3.5 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ ppc64le

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (3.2 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ i686

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.9 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ ARMv7l

tokenizers-0.21.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.0 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+manylinux: glibc 2.17+ ARM64

tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl (2.7 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+macOS 11.0+ ARM64

tokenizers-0.21.2-cp39-abi3-macosx_10_12_x86_64.whl (2.9 MB view details)

Uploaded Jun 24, 2025 CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file tokenizers-0.21.2.tar.gz.

File metadata

Download URL: tokenizers-0.21.2.tar.gz
Upload date: Jun 24, 2025
Size: 351.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2.tar.gz
Algorithm	Hash digest
SHA256	`fdc7cffde3e2113ba0e6cc7318c40e3438a4d74bbc62bf04bcc63bdfb082ac77`
MD5	`bab3398f4c622a2628e68b2511ef6b3d`
BLAKE2b-256	`ab2db0fce2b8201635f60e8c95990080f58461cc9ca3d5026de2e900f38a7f21`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-win_amd64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-win_amd64.whl
Upload date: Jun 24, 2025
Size: 2.5 MB
Tags: CPython 3.9+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`58747bb898acdb1007f37a7bbe614346e98dc28708ffb66a3fd50ce169ac6c98`
MD5	`5fa3b436e0cbcf3cbd1b31aa77d062dc`
BLAKE2b-256	`13c3cc2755ee10be859c4338c962a35b9a663788c0c0b50c0bdd8078fb6870cf`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-win32.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-win32.whl
Upload date: Jun 24, 2025
Size: 2.3 MB
Tags: CPython 3.9+, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-win32.whl
Algorithm	Hash digest
SHA256	`cabda5a6d15d620b6dfe711e1af52205266d05b379ea85a8a301b3593c60e962`
MD5	`716a8c81d443a5dfe1b807b145bd31ed`
BLAKE2b-256	`d8a5896e1ef0707212745ae9f37e84c7d50269411aef2e9ccd0de63623feecdf`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-musllinux_1_2_x86_64.whl
Upload date: Jun 24, 2025
Size: 9.5 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm	Hash digest
SHA256	`106746e8aa9014a12109e58d540ad5465b4c183768ea96c03cbc24c44d329958`
MD5	`e0918e896c63d212b027853dac8e727a`
BLAKE2b-256	`a4d2faa1acac3f96a7427866e94ed4289949b2524f0c1878512516567d80563c`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-musllinux_1_2_i686.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-musllinux_1_2_i686.whl
Upload date: Jun 24, 2025
Size: 9.3 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ i686
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-musllinux_1_2_i686.whl
Algorithm	Hash digest
SHA256	`0e73770507e65a0e0e2a1affd6b03c36e3bc4377bd10c9ccf51a82c77c0fe365`
MD5	`bf65879c181f34f25f3f9b4727763636`
BLAKE2b-256	`637b5440bf203b2a5358f074408f7f9c42884849cd9972879e10ee6b7a8c3b3d`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-musllinux_1_2_armv7l.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-musllinux_1_2_armv7l.whl
Upload date: Jun 24, 2025
Size: 9.1 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ ARMv7l
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-musllinux_1_2_armv7l.whl
Algorithm	Hash digest
SHA256	`ed21dc7e624e4220e21758b2e62893be7101453525e3d23264081c9ef9a6d00d`
MD5	`91f14070c84924b6d1a81a35d27f3752`
BLAKE2b-256	`6cbdac386d79c4ef20dc6f39c4706640c24823dca7ebb6f703bfe6b5f0292d88`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-musllinux_1_2_aarch64.whl
Upload date: Jun 24, 2025
Size: 9.1 MB
Tags: CPython 3.9+, musllinux: musl 1.2+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm	Hash digest
SHA256	`2c41862df3d873665ec78b6be36fcc30a26e3d4902e9dd8608ed61d49a48bc19`
MD5	`9f8917a01196c1a317bfd5892dca7f26`
BLAKE2b-256	`3c6abc220a11a17e5d07b0dfb3b5c628621d4dcc084bccd27cfaead659963016`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 24, 2025
Size: 3.1 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`fed9a4d51c395103ad24f8e7eb976811c57fbec2af9f133df471afcd922e5020`
MD5	`1687c605e98622a6c7e8ed1fa00e00bf`
BLAKE2b-256	`c574f41a432a0733f61f3d21b288de6dfa78f7acff309c6f0f323b2833e9189f`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Jun 24, 2025
Size: 3.2 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`b1b9405822527ec1e0f7d8d2fdb287a5730c3a6518189c968254a8441b21faae`
MD5	`9c7df6206897936f07d37a258e1fbfd4`
BLAKE2b-256	`385f959f3a8756fc9396aeb704292777b84f02a5c6f25c3fc3ba7530db5feb2c`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Jun 24, 2025
Size: 3.5 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`514cd43045c5d546f01142ff9c79a96ea69e4b5cda09e3027708cb2e6d5762ab`
MD5	`d7b25e6086fa863a9f3d03aceba62273`
BLAKE2b-256	`001579713359f4037aa8f4d1f06ffca35312ac83629da062670e8830917e2153`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Upload date: Jun 24, 2025
Size: 3.2 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ i686
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm	Hash digest
SHA256	`5e9944e61239b083a41cf8fc42802f855e1dca0f499196df37a8ce219abac6eb`
MD5	`1a145aeaaa396b41989a1a7aeb30ac6b`
BLAKE2b-256	`a52e53e8fd053e1f3ffbe579ca5f9546f35ac67cf0039ed357ad7ec57f5f5af0`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Upload date: Jun 24, 2025
Size: 2.9 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ARMv7l
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm	Hash digest
SHA256	`8bd8999538c405133c2ab999b83b17c08b7fc1b48c1ada2469964605a709ef91`
MD5	`7115df53491d926cb64f8085b2222474`
BLAKE2b-256	`0515fd2d8104faa9f86ac68748e6f7ece0b5eb7983c7efc3a2c197cb98c99030`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Jun 24, 2025
Size: 3.0 MB
Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`4a32cd81be21168bd0d6a0f0962d60177c447a1aa1b1e48fa6ec9fc728ee0b12`
MD5	`d736db2bee4be16feca2e85f5af87362`
BLAKE2b-256	`332b1791eb329c07122a75b01035b1a3aa22ad139f3ce0ece1b059b506d9d9de`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl
Upload date: Jun 24, 2025
Size: 2.7 MB
Tags: CPython 3.9+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`126df3205d6f3a93fea80c7a8a266a78c1bd8dd2fe043386bafdd7736a23e45f`
MD5	`5debc4095b99f43c621b2eae897294f8`
BLAKE2b-256	`6ce633f41f2cc7861faeba8988e7a77601407bf1d9d28fc79c5903f8f77df587`

See more details on using hashes here.

File details

Details for the file tokenizers-0.21.2-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: tokenizers-0.21.2-cp39-abi3-macosx_10_12_x86_64.whl
Upload date: Jun 24, 2025
Size: 2.9 MB
Tags: CPython 3.9+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.9.0

File hashes

Hashes for tokenizers-0.21.2-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`342b5dfb75009f2255ab8dec0041287260fed5ce00c323eb6bab639066fef8ec`
MD5	`f1053c3a3bec21e779a5b9069f9cd69f`
BLAKE2b-256	`1dcc2936e2d45ceb130a21d929743f1e9897514691bec123203e10837972296f`

See more details on using hashes here.

tokenizers 0.21.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokenizers

Main features:

Installation

With pip:

From sources:

Load a pretrained tokenizer from the Hub

Using the provided Tokenizers

Provided Tokenizers

Build your own

Building a byte-level BPE

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes