Fast and Customizable Tokenizers

These details have been verified by PyPI

Maintainers

ArthurZucker danieldk McPotato Nicolas.Patry xn1t0x

These details have not been verified by PyPI

Project links

Homepage

Project description

Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
Easy to use, but also extremely versatile.
Designed for research and production.
Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Load a pretrained tokenizer from the Hub

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("bert-base-cased")

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

CharBPETokenizer: The original BPE
ByteLevelBPETokenizer: The byte level version of the BPE
SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(
    vocab_size=20000,
    min_frequency=2,
    initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)
tokenizer.train([
    "./path/to/dataset/1.txt",
    "./path/to/dataset/2.txt",
    "./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details

These details have been verified by PyPI

Maintainers

ArthurZucker danieldk McPotato Nicolas.Patry xn1t0x

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.22.2

Jan 5, 2026

0.22.2rc0 pre-release

Dec 2, 2025

0.22.1

Sep 19, 2025

0.22.1rc0 pre-release

Sep 19, 2025

0.22.0

Aug 29, 2025

0.22.0rc0 pre-release

Aug 29, 2025

0.21.4

Jul 28, 2025

0.21.2

Jun 24, 2025

0.21.2rc0 pre-release

Jun 24, 2025

0.21.1

Mar 13, 2025

0.21.1rc0 pre-release

Mar 12, 2025

0.21.0

Nov 27, 2024

0.21.0rc0 pre-release

Nov 27, 2024

0.20.4 yanked

Nov 26, 2024

Reason this release was yanked:

Removing support for python3.7 and 3.8 is breaking and is move to v0.21.0

0.20.4rc0 pre-release

Nov 26, 2024

0.20.3

Nov 5, 2024

0.20.2

Nov 4, 2024

0.20.1

Oct 10, 2024

0.20.0

Aug 8, 2024

0.19.1

Apr 17, 2024

0.19.0

Apr 17, 2024

0.15.2

Feb 12, 2024

0.15.1

Jan 22, 2024

0.15.0

Nov 14, 2023

0.14.1

Oct 6, 2023

0.14.0

Sep 7, 2023

0.13.3

Apr 5, 2023

0.13.2

Nov 7, 2022

0.13.1

Oct 6, 2022

This version

0.13.0

Sep 21, 2022

0.12.1

Apr 13, 2022

0.12.0 yanked

Mar 31, 2022

Reason this release was yanked:

Breaking change with unexpected effect

0.11.6

Feb 28, 2022

0.11.5

Feb 16, 2022

0.11.4

Jan 17, 2022

0.11.3

Jan 17, 2022

0.11.2

Jan 4, 2022

0.11.1

Dec 28, 2021

0.11.0

Dec 23, 2021

0.10.3

May 24, 2021

0.10.2

Apr 5, 2021

0.10.1

Feb 8, 2021

0.10.0

Jan 12, 2021

0.10.0rc1 pre-release

Dec 8, 2020

0.9.4

Nov 10, 2020

0.9.3

Oct 26, 2020

0.9.2

Oct 15, 2020

0.9.1

Oct 13, 2020

0.9.0

Oct 9, 2020

0.9.0rc2 pre-release

Oct 6, 2020

0.9.0rc1 pre-release

Sep 29, 2020

0.9.0.dev4 pre-release

Sep 24, 2020

0.9.0.dev3 pre-release

Sep 18, 2020

0.9.0.dev2 pre-release

Sep 14, 2020

0.9.0.dev1 pre-release

Sep 14, 2020

0.9.0.dev0 pre-release

Aug 21, 2020

0.8.1

Jul 20, 2020

0.8.1rc2 pre-release

Jul 17, 2020

0.8.1rc1 pre-release

Jul 6, 2020

0.8.0

Jun 26, 2020

0.8.0rc4 pre-release

Jun 26, 2020

0.8.0rc3 pre-release

Jun 22, 2020

0.8.0rc2 pre-release

Jun 19, 2020

0.8.0rc1 pre-release

Jun 11, 2020

0.8.0.dev2 pre-release

Jun 3, 2020

0.8.0.dev1 pre-release

May 27, 2020

0.8.0.dev0 pre-release

May 21, 2020

0.7.0

Apr 17, 2020

0.7.0rc7 pre-release

Apr 16, 2020

0.7.0rc6 pre-release

Apr 16, 2020

0.7.0rc5 pre-release

Apr 9, 2020

0.7.0rc4 pre-release

Apr 8, 2020

0.6.0

Mar 2, 2020

0.5.2

Feb 24, 2020

0.5.1

Feb 24, 2020

0.5.0

Feb 19, 2020

0.4.2

Feb 11, 2020

0.4.1

Feb 11, 2020

0.4.0

Feb 10, 2020

0.3.0

Feb 5, 2020

0.2.1

Jan 22, 2020

0.2.0

Jan 20, 2020

0.1.1

Jan 12, 2020

0.1.0

Jan 10, 2020

0.0.13

Jan 8, 2020

0.0.12

Jan 7, 2020

0.0.11

Dec 27, 2019

0.0.10

Dec 26, 2019

0.0.9

Dec 23, 2019

0.0.8

Dec 20, 2019

0.0.7

Dec 17, 2019

0.0.6

Dec 17, 2019

0.0.5

Dec 13, 2019

0.0.4

Dec 10, 2019

0.0.3

Dec 3, 2019

0.0.2

Dec 2, 2019

0.0.1

Nov 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.13.0.tar.gz (358.7 kB view details)

Uploaded Sep 21, 2022 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenizers-0.13.0-cp310-cp310-win_amd64.whl (3.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.10Windows x86-64

tokenizers-0.13.0-cp310-cp310-win32.whl (3.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.10Windows x86

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded Sep 21, 2022 CPython 3.10manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.10manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded Sep 21, 2022 CPython 3.10manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.10manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl (3.6 MB view details)

Uploaded Sep 21, 2022 CPython 3.10macOS 12.0+ ARM64

tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded Sep 21, 2022 CPython 3.10macOS 10.11+ x86-64

tokenizers-0.13.0-cp39-cp39-win_amd64.whl (3.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.9Windows x86-64

tokenizers-0.13.0-cp39-cp39-win32.whl (3.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.9Windows x86

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded Sep 21, 2022 CPython 3.9manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.9manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded Sep 21, 2022 CPython 3.9manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.9manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl (3.6 MB view details)

Uploaded Sep 21, 2022 CPython 3.9macOS 12.0+ ARM64

tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded Sep 21, 2022 CPython 3.9macOS 10.11+ x86-64

tokenizers-0.13.0-cp38-cp38-win_amd64.whl (3.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.8Windows x86-64

tokenizers-0.13.0-cp38-cp38-win32.whl (3.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.8Windows x86

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded Sep 21, 2022 CPython 3.8manylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.8manylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded Sep 21, 2022 CPython 3.8manylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.8manylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded Sep 21, 2022 CPython 3.8macOS 10.11+ x86-64

tokenizers-0.13.0-cp37-cp37m-win_amd64.whl (3.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mWindows x86-64

tokenizers-0.13.0-cp37-cp37m-win32.whl (3.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mWindows x86

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl (7.9 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mmanylinux: glibc 2.17+ s390x

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.3 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mmanylinux: glibc 2.17+ ppc64le

tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mmanylinux: glibc 2.17+ ARM64

tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.0 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mmanylinux: glibc 2.12+ x86-64

tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl (3.8 MB view details)

Uploaded Sep 21, 2022 CPython 3.7mmacOS 10.11+ x86-64

File details

Details for the file tokenizers-0.13.0.tar.gz.

File metadata

Download URL: tokenizers-0.13.0.tar.gz
Upload date: Sep 21, 2022
Size: 358.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`e49a137bd9321905bd37abcc77d34b7d9d6d11e09da3a901bd127e640be55985`
MD5	`d8d9ab2ea3358970b8c40eabdb1f0405`
BLAKE2b-256	`444b323787e105caddf5ace40c4007e0745abf97e00ef21554e268c6d266d64d`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-win_amd64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-win_amd64.whl
Upload date: Sep 21, 2022
Size: 3.3 MB
Tags: CPython 3.10, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`5013e8822bfef5e339d086cff96f3d69eaeed58e471ef5dd71b323470a73cd64`
MD5	`2af114a7b1b6c3712f1d3c4025b2b253`
BLAKE2b-256	`49c1f7fc459b8d3d0742f64d6dee117421eb16c3353daa65aa26b2e1f04d5c99`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-win32.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-win32.whl
Upload date: Sep 21, 2022
Size: 3.0 MB
Tags: CPython 3.10, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-win32.whl
Algorithm	Hash digest
SHA256	`4ff01a4320dd47127fc52c242fcfa33b661f89b50ed4ad70436b06179f567f1e`
MD5	`d173123d3110d04557c2f9bc3c71ad5e`
BLAKE2b-256	`a68fdb9e8e2566ed4f48368f9181b125f48de15919581dac2564f1b9908399c9`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Sep 21, 2022
Size: 7.9 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`ad3951e9584cd386111fd4edfdce70083f62a3e82f739347192c0c4af1288554`
MD5	`636ffbc118afac2be7924d6f456267e9`
BLAKE2b-256	`ae011be2be7a700b7a21dff4a4add07c6c2e489b28f11afa5aead41f65fd0d8a`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Sep 21, 2022
Size: 8.3 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`db9a71168b0f12f1092752bc0c58c3d0e87e58262de3ab3424dfb052f1003def`
MD5	`30fe87bcc1b8b35634dcaf7fb3fb6c41`
BLAKE2b-256	`683587bd74adbef807368a57096af7414e4b44931a7849bfe163cd128937f32e`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Sep 21, 2022
Size: 7.2 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`076659344b815464d4f2965fc90c52ebbb2df8d2dcf40ab044705cd15178c44b`
MD5	`122e21b52840d178d929de8cc88d734e`
BLAKE2b-256	`9cdc4b6a74c7205cb92c78948c2e8bb023d910de0c41974d8c122719c59fb3be`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Upload date: Sep 21, 2022
Size: 7.0 MB
Tags: CPython 3.10, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`e2dbd5105adbf582fc992eaddc39b469f34f1216f9c79d5bbb0a3f2700fbc0ac`
MD5	`b6b2a462618926ec77e6b1ea662853c7`
BLAKE2b-256	`cc674c05eb8cbe8d20e52f5f47a9c591738d8cbc2a29e918813b7fcc431ec3db`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl
Upload date: Sep 21, 2022
Size: 3.6 MB
Tags: CPython 3.10, macOS 12.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm	Hash digest
SHA256	`cef0026587d0f911232b8d6d38da64ea8ea187139f2a9506efd0c422e5e49926`
MD5	`5c19e7832bd96e70a26bc3b041933677`
BLAKE2b-256	`1ec115067b1918e0d4573aead3f3ac89568514a57dd251d837917e5385a43204`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl
Upload date: Sep 21, 2022
Size: 3.8 MB
Tags: CPython 3.10, macOS 10.11+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp310-cp310-macosx_10_11_x86_64.whl
Algorithm	Hash digest
SHA256	`a3c69203690744fe357b95a4fbeb8207b3a3b0d565fba07ccab5f62491f8b6f0`
MD5	`a0bd1e8c2432d268326615bdaf57cad3`
BLAKE2b-256	`9a325dfa7777886a710a7e92b71985e1b8efd4f48da435eef1dae8df826fece8`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-win_amd64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-win_amd64.whl
Upload date: Sep 21, 2022
Size: 3.3 MB
Tags: CPython 3.9, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`7dcaf96ec433b944f295902af0fe9aad6724b38f55544cbd81c30e0c4b401198`
MD5	`93d900bd697da1f4180345a72c81e07b`
BLAKE2b-256	`53b5e856f2d280f5a21db4c9cfa2498b3b26555e32515bc0518d6c2388d623d2`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-win32.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-win32.whl
Upload date: Sep 21, 2022
Size: 3.0 MB
Tags: CPython 3.9, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-win32.whl
Algorithm	Hash digest
SHA256	`74db42e50607a2a7b93beb37f6ddcbb989c895236cd472edccae0960a64c15b2`
MD5	`1cfe1425f4638cc89b40494f61b61087`
BLAKE2b-256	`a37f9bab774e840efa794290855d2487874f36b443009df5cf6faafa69ba1b8c`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Sep 21, 2022
Size: 7.9 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`ed08aa1ed60de3ddbf88329b4567a77d72d012161c4df692ac4b5b2294933a6d`
MD5	`40c3c8065ed44eeaf0694a03a1131d07`
BLAKE2b-256	`5007727210ead354d5cfcf537244f22c173b5394605873b1f8db8d9c1cda6ede`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Sep 21, 2022
Size: 8.3 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`0b17d9f269102241c63d4e46c4f64966fba0cc59a8c765ccba67e1873dd1f9d9`
MD5	`b0b68bdc221815b17273be8c81d27b01`
BLAKE2b-256	`d33a95f217de0d7f824ae6aabe0359dc2595d063cac65eb6aa4b5b4690fc377f`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Sep 21, 2022
Size: 7.2 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`5311b14d16a8dd8f68218e2df21fc5e230871fae9a866598addd94f0fb67daa7`
MD5	`81f7bad97645d22574a9eb13ea67f51a`
BLAKE2b-256	`5c68eb821731a4704c08f9cba80ab2f37fe87a0df07bbf72dff020ac0d68b68c`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Upload date: Sep 21, 2022
Size: 7.0 MB
Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`61d9697d79277cf56562e22ccda3de4b9f535e6e7e38b0b099e592bf953ad168`
MD5	`635c3c1a2ae0d1c9055c3adb7f0fcc45`
BLAKE2b-256	`4756b57fcc25a8aa4921b7201187563905a57d092b0af4998ad2aa038b7224fd`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl
Upload date: Sep 21, 2022
Size: 3.6 MB
Tags: CPython 3.9, macOS 12.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm	Hash digest
SHA256	`bd437dc2d3cd847828ca437bcbd0656fe0d8b722a495f852b4625a24a666c275`
MD5	`779d329013013303c3468e5ca8b38b59`
BLAKE2b-256	`36de1d66b543f121e3a18db00a15d838f65dca71d744ff51c3ebeb0ea8db03e1`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl
Upload date: Sep 21, 2022
Size: 3.8 MB
Tags: CPython 3.9, macOS 10.11+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm	Hash digest
SHA256	`619c002d74bfa78713aa451d5f630ea742a7628a022af858fbc29e4ac55d0be9`
MD5	`830c5c3f3c866a88462bc234b7a4dd64`
BLAKE2b-256	`111e1cf5c6fc7bb9cae4e6d89b0cbbc4814405af24f9a1db009cf528beb339e4`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-win_amd64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-win_amd64.whl
Upload date: Sep 21, 2022
Size: 3.3 MB
Tags: CPython 3.8, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`66fbf06c25b176228125ddb4c67b26c1d1cd46c2b82d1fa952cd0d2e6ed7762e`
MD5	`cfe53036235cb3c9517e5d715d44dccf`
BLAKE2b-256	`6f2d626fc1256873a80e2f126eba9d56596940a6862ec8adb7826cb8fcd2e4ea`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-win32.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-win32.whl
Upload date: Sep 21, 2022
Size: 3.0 MB
Tags: CPython 3.8, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-win32.whl
Algorithm	Hash digest
SHA256	`9bcbd62830219cf8699d7acdbbbf8625cb65bcea8e189f14463edd3bcf021eed`
MD5	`c4dfeaafbc16e6f047ab7c57e7e0d407`
BLAKE2b-256	`fa1cd39a357560bf0e895ca580cf4774490be6c5d18f42967b88aecc52e1fbbf`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Sep 21, 2022
Size: 7.9 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`4125db11089d923452bb47bfc050c9b42f459f978ee1a214cabecbb98ea4b2be`
MD5	`4ec65cec09e9186b8f286e4431b5aaca`
BLAKE2b-256	`97e6cd412f662ca4de2e1ca37e6c5827c61de5bdafc8051f23d52a51966010ef`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Sep 21, 2022
Size: 8.3 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`479320e697cdf538e5d4ebffc4e1dc38b23388ca7dbb2fa7266fbd7173c84a57`
MD5	`e1018e436d4dc93a2d7958e20ac623ad`
BLAKE2b-256	`47bcb296189f1483ac38907cf2d7be82f2e3e8f4a814502b9e0e59fff0fcb245`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Sep 21, 2022
Size: 7.2 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`9846d131ae4789bdce999276245d571d5abcd647ca4e913d5ff0f45b4dcf7fae`
MD5	`33d7468710e1535eb2a49bdd2d8e183a`
BLAKE2b-256	`c2d640f51550630154526cb30ed9f1c6d896d736f88198d59c471a4038087077`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Upload date: Sep 21, 2022
Size: 7.0 MB
Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`879585ef9b219ea0e300c670d4545e6a007363a022966a27c1dee8146666d38a`
MD5	`48fee4380c1fad08a2a544ad86b12e31`
BLAKE2b-256	`61fcd82d60ed5c7306e0b991a2966dec0b839426ea783668a9d409a32a59bbb4`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl
Upload date: Sep 21, 2022
Size: 3.8 MB
Tags: CPython 3.8, macOS 10.11+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm	Hash digest
SHA256	`8eacb864f764ea3f12fd8735a1879ac84b9658d3083c47b0f005a5ed8c97fd5a`
MD5	`f40e346d6b61a4e953aa90a9f3155436`
BLAKE2b-256	`0c1a3b349adb50e289699096eded8e76d3b124100b68fde66455c00ff6c94283`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-win_amd64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-win_amd64.whl
Upload date: Sep 21, 2022
Size: 3.3 MB
Tags: CPython 3.7m, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-win_amd64.whl
Algorithm	Hash digest
SHA256	`2b8636aa712b2de7efccef67c63eb1e4c10df3bcbd3a467573890e134cf00ba9`
MD5	`37ee5eb4a6589cabb2ecfeada6c97ff3`
BLAKE2b-256	`8281bc120d3e466a65cb135ee8d6192b6743573ecb3550c8ab8e2cf4b6fe3f2a`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-win32.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-win32.whl
Upload date: Sep 21, 2022
Size: 3.0 MB
Tags: CPython 3.7m, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-win32.whl
Algorithm	Hash digest
SHA256	`4f88a82db27de2353ea9b3519b7bd3bb27f65fa01860a714cd959e5b574b99fc`
MD5	`a86265d81073d1ccd9e4dd903b2e1173`
BLAKE2b-256	`a15f1de38fc31054c3ff7f24c277f579632ec87dbfff2f69907ed741c970c27a`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Upload date: Sep 21, 2022
Size: 7.9 MB
Tags: CPython 3.7m, manylinux: glibc 2.17+ s390x
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm	Hash digest
SHA256	`00c295f301568b4f77e3d8eff3dac9791b184d615d0ba0c2bc10a1e7556b3c37`
MD5	`ca255c52fc21a56a5c5aa881bfdf030a`
BLAKE2b-256	`17ed6111bd017a860c36d930527f18644d723e58aa1407facc83edbaafeabb03`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Upload date: Sep 21, 2022
Size: 8.3 MB
Tags: CPython 3.7m, manylinux: glibc 2.17+ ppc64le
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm	Hash digest
SHA256	`4b822ab35ab6a874a5142a5e6cad703e049ec0836cc1cdc20f2c0aca4d568ec3`
MD5	`e099f3218e5d97b19cca4237799aab15`
BLAKE2b-256	`fb9c76d4428f4cdc9b1dbeb3482ea974ab94fbfbee2260b42221b2f2c6048bc2`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Sep 21, 2022
Size: 7.2 MB
Tags: CPython 3.7m, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`f8f6c2d896e0a59d42b58de83291f8c2c392d12737ac49484e53bd66b0fe32e1`
MD5	`393f9a3ad928d99186b72884274bdee6`
BLAKE2b-256	`4eca4ab54f8cce18ec5f5a99b6abf7cc5ff062fb2ad329d29504fb1993720ed7`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Upload date: Sep 21, 2022
Size: 7.0 MB
Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`b73fc2f3c9dfe6df219335edafb14de9f93ffbe3916911e76030834f7d4cefcd`
MD5	`cece6c932f67488588aa80714e30c191`
BLAKE2b-256	`29b0a0d409092a885aef3a2522cd0ac1cf439a6fa9b1c91a0a1257924a838bb2`

See more details on using hashes here.

File details

Details for the file tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

Download URL: tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl
Upload date: Sep 21, 2022
Size: 3.8 MB
Tags: CPython 3.7m, macOS 10.11+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for tokenizers-0.13.0-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm	Hash digest
SHA256	`bd25e9ddf712f8699a29298c563340cf2d6d91e5dd0816a4bfdbb6319bc9e1b0`
MD5	`75ba7a0cfe8f0fde96e543ba72522af7`
BLAKE2b-256	`4036e050b40ae5f9c81e2826e021ada917711ecad1583338d87fdf2373269804`

See more details on using hashes here.

tokenizers 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokenizers

Main features:

Installation

With pip:

From sources:

Load a pretrained tokenizer from the Hub

Using the provided Tokenizers

Provided Tokenizers

Build your own

Building a byte-level BPE

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata