Skip to main content

Fast and Customizable Tokenizers

Project description



Build GitHub


Tokenizers

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.

Bindings over the Rust implementation. If you are interested in the High-level design, you can go check it there.

Otherwise, let's dive in!

Main features:

  • Train new vocabularies and tokenize using 4 pre-made tokenizers (Bert WordPiece and the 3 most common BPE versions).
  • Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
  • Easy to use, but also extremely versatile.
  • Designed for research and production.
  • Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
  • Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.

Installation

With pip:

pip install tokenizers

From sources:

To use this method, you need to have the Rust installed:

# Install with:
curl https://sh.rustup.rs -sSf | sh -s -- -y
export PATH="$HOME/.cargo/bin:$PATH"

Once Rust is installed, you can compile doing the following

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python

# Create a virtual env (you can use yours as well)
python -m venv .env
source .env/bin/activate

# Install `tokenizers` in the current virtual env
pip install setuptools_rust
python setup.py install

Using the provided Tokenizers

We provide some pre-build tokenizers to cover the most common cases. You can easily load one of these using some vocab.json and merges.txt files:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
vocab = "./path/to/vocab.json"
merges = "./path/to/merges.txt"
tokenizer = CharBPETokenizer(vocab, merges)

# And then encode:
encoded = tokenizer.encode("I can feel the magic, can you?")
print(encoded.ids)
print(encoded.tokens)

And you can train them just as simply:

from tokenizers import CharBPETokenizer

# Initialize a tokenizer
tokenizer = CharBPETokenizer()

# Then train it!
tokenizer.train([ "./path/to/files/1.txt", "./path/to/files/2.txt" ])

# Now, let's use it:
encoded = tokenizer.encode("I can feel the magic, can you?")

# And finally save it somewhere
tokenizer.save("./path/to/directory/my-bpe.tokenizer.json")

Provided Tokenizers

  • CharBPETokenizer: The original BPE
  • ByteLevelBPETokenizer: The byte level version of the BPE
  • SentencePieceBPETokenizer: A BPE implementation compatible with the one used by SentencePiece
  • BertWordPieceTokenizer: The famous Bert tokenizer, using WordPiece

All of these can be used and trained as explained above!

Build your own

Whenever these provided tokenizers don't give you enough freedom, you can build your own tokenizer, by putting all the different parts you need together. You can check how we implemented the provided tokenizers and adapt them easily to your own needs.

Building a byte-level BPE

Here is an example showing how to build your own byte-level BPE by putting all the different pieces together, and then saving it to a single file:

from tokenizers import Tokenizer, models, pre_tokenizers, decoders, trainers, processors

# Initialize a tokenizer
tokenizer = Tokenizer(models.BPE())

# Customize pre-tokenization and decoding
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=True)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.post_processor = processors.ByteLevel(trim_offsets=True)

# And then train
trainer = trainers.BpeTrainer(vocab_size=20000, min_frequency=2)
tokenizer.train([
	"./path/to/dataset/1.txt",
	"./path/to/dataset/2.txt",
	"./path/to/dataset/3.txt"
], trainer=trainer)

# And Save it
tokenizer.save("byte-level-bpe.tokenizer.json", pretty=True)

Now, when you want to use this tokenizer, this is as simple as:

from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("byte-level-bpe.tokenizer.json")

encoded = tokenizer.encode("I can feel the magic, can you?")

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers-0.10.0rc1.tar.gz (208.6 kB view details)

Uploaded Source

Built Distributions

tokenizers-0.10.0rc1-cp39-cp39-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.9 Windows x86-64

tokenizers-0.10.0rc1-cp39-cp39-win32.whl (1.8 MB view details)

Uploaded CPython 3.9 Windows x86

tokenizers-0.10.0rc1-cp39-cp39-manylinux2010_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

tokenizers-0.10.0rc1-cp39-cp39-macosx_10_11_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.11+ x86-64

tokenizers-0.10.0rc1-cp38-cp38-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.8 Windows x86-64

tokenizers-0.10.0rc1-cp38-cp38-win32.whl (1.8 MB view details)

Uploaded CPython 3.8 Windows x86

tokenizers-0.10.0rc1-cp38-cp38-manylinux2010_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tokenizers-0.10.0rc1-cp38-cp38-macosx_10_11_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8 macOS 10.11+ x86-64

tokenizers-0.10.0rc1-cp37-cp37m-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.7m Windows x86-64

tokenizers-0.10.0rc1-cp37-cp37m-win32.whl (1.8 MB view details)

Uploaded CPython 3.7m Windows x86

tokenizers-0.10.0rc1-cp37-cp37m-manylinux2010_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tokenizers-0.10.0rc1-cp37-cp37m-macosx_10_11_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.7m macOS 10.11+ x86-64

tokenizers-0.10.0rc1-cp36-cp36m-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.6m Windows x86-64

tokenizers-0.10.0rc1-cp36-cp36m-win32.whl (1.8 MB view details)

Uploaded CPython 3.6m Windows x86

tokenizers-0.10.0rc1-cp36-cp36m-manylinux2010_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

tokenizers-0.10.0rc1-cp36-cp36m-macosx_10_11_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.6m macOS 10.11+ x86-64

tokenizers-0.10.0rc1-cp35-cp35m-win_amd64.whl (2.1 MB view details)

Uploaded CPython 3.5m Windows x86-64

tokenizers-0.10.0rc1-cp35-cp35m-win32.whl (1.8 MB view details)

Uploaded CPython 3.5m Windows x86

tokenizers-0.10.0rc1-cp35-cp35m-manylinux2010_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.5m manylinux: glibc 2.12+ x86-64

tokenizers-0.10.0rc1-cp35-cp35m-macosx_10_11_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

File details

Details for the file tokenizers-0.10.0rc1.tar.gz.

File metadata

  • Download URL: tokenizers-0.10.0rc1.tar.gz
  • Upload date:
  • Size: 208.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1.tar.gz
Algorithm Hash digest
SHA256 f477db6e405b48b98cbbf756ca1e04787fd33b809ffa02e8631b405917cbb27a
MD5 fda552acea33b21c0e38341bef0e01af
BLAKE2b-256 2e93e31664756cca1fed5302515f1e0b304f0ecaf4a1489046e729d31d4fa3ea

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 f1389e83772a6226f43ba41cfdb21c33c3eb7a32197869da5d2a1df7437ec6de
MD5 ac6799de8ada0e74c3c1de0b4d67258f
BLAKE2b-256 9a2fcab5b49eb8cf66b38561e28507c9a02a7d1b77eb2a409d49ccaf7bb402e9

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-win32.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 8f1e87be8a92720772ef57d7ee0b14f38a168af5b857cd024afce1f320b8ff44
MD5 8cc64299bc85fb3b57f6d0dc7ef92931
BLAKE2b-256 8bebf72e8ba9f829e6a790fd61abaf1fda5047aa5e99620a654b400dda734c67

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_s390x.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_s390x.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 65df2da88a0ecd3c050d9a043a1b90282c811604a667638c811a2292dcdfb0d2
MD5 a80be04e60bb10b9fa458f9b04047d10
BLAKE2b-256 75b37b9fa12f6e0b72bf426c4ae2f3609df78c5c87dc58f15e345460a15cd14d

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_ppc64le.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_ppc64le.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 37cc8ff72d1cc11acefe11d0dec3910d43a1084011bc60af2940fbeba670342d
MD5 6a95ff71aee1d1e0e7f0368c4cbccf0f
BLAKE2b-256 375a700417661efa0b3d4041830418f76ba267372b76d1cb3fb89704f0d47ebb

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_aarch64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_aarch64.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 01993fabbe7f8bb4a2e2450d98abe529a2440f35b26d041359b0581970d86865
MD5 c74f15ee633233d8e6c3fb5fa0f7bef0
BLAKE2b-256 28757f4771755fd4b6eb7655772d9dabe9acbd0a3bda6b888922c7d852ea26d6

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0220c64d1e8266b58740a70507aaebbaf915a710099644f8a47021269a91fea3
MD5 87709b309163fe91c179000e0b6050c9
BLAKE2b-256 140a5a30f93a4fd3a10cf169a508d543aaf3dc27c38e75063d7beb02870819a8

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e5abaaf237cf8f2ab88fad0396e106878249f337b46815564abbf8cfe9b130d5
MD5 215098fc91b1d196fcf2e4fc51cba302
BLAKE2b-256 cf02b89a8f70d9b74a3635cf1624a512666a6b7631f9a27b45679054b6b051cf

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp39-cp39-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp39-cp39-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.9, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp39-cp39-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 58540e361265f67fd37e64eb5c533edb1bce9b4ce79b803a4d6f7172733c6c1f
MD5 cd3c71a697ce51116d62b3577a343da7
BLAKE2b-256 ea5b9e70229eb0d6f9f7ff39f1305d82163c0ddd82ff2c561664613602bc5522

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 2bf8456fcb658959e51c62febfd3d270779ce2b64dd5e29669c5a29e4ada10fe
MD5 7619c0dc6d47fa8673736d6997929a1f
BLAKE2b-256 5ea952fc5f4e43fc0ef3bf1f45849dcfe050a741dc7e570155656654ba1b7fe1

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-win32.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 731ed72ea684e64af27f909896f967e38e2e1230df0921b834ecfa84cec19294
MD5 edf9ee0471ee66699a2b50f3e74d8475
BLAKE2b-256 38ba728885a78eadddb23cf2231d534297f1bb3f1196b5befd648d3e4458c6db

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_s390x.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_s390x.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 37d4b73ae00442dc7cada84518a80952a2619333e3b9b8d4fbc5b427f1649107
MD5 e84dad83e12a40533a01f8d6b9822734
BLAKE2b-256 c8813a8aa8a8a83244d27108b343ad63c0fa872d448727f2385526e49d67b482

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_ppc64le.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_ppc64le.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 d4edbee3dc365e9856c445c5e3e0187cb84374880970f4c87f179b7835c40f92
MD5 554f88ec780fd30d9f0673e324d5ec77
BLAKE2b-256 83ee2c94b5617860e9d58c60b128b5c5a125502b148fe2e1cfcfc4a1011b72eb

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_aarch64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_aarch64.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e48c9d64b45285f64bb46279a54de615abcd316a872da66982b202da498ecfa2
MD5 090fdb1685b1fc4dcd454f396fc7b737
BLAKE2b-256 3bdf727d099eacba7f410a31ec116738931393f83fe74c259b0f2789d93c0ae9

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 44574aa5c980564ff71d619a14b6373b2fda9c721e1f1f8a55eee2ecd0602d13
MD5 8f411f74da80518785fbaefb4b41f3d2
BLAKE2b-256 64d9c34f6061fd6238aabae13b71d89ed8ea25c533e32adc80ba5cac8fab8e0c

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3c639a4db48f6ec4732908502c7497fc284d60d0c83311b6e8893ac28a5e5740
MD5 13a45b6cbbf6017432decd05b1b54900
BLAKE2b-256 3e5c7f645c71be5ba639cb4cf8d238e4fd228df091147b64112bb1feeabf23b6

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp38-cp38-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp38-cp38-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.8, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp38-cp38-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 7690fbdb64d5ae2d7129f7670fd0931b0ab46c8334cb01b75e909bed59c487b0
MD5 77b73fd9367deb6de8c241d58392acb4
BLAKE2b-256 4d33175aeec670d2ba176944029e903171ae4817160fc99429df18f52033b762

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8949b751b3311736904cfd4506713537e9a5049d99864be71b74e7e5edbc17e6
MD5 d965179be83cc4eb81da30cd2281c90d
BLAKE2b-256 7f22848e2f06db56b3bca76f39603ef6d13f89bb598917dbeaf44deea16b314d

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 747a94b691a5f59a3adc51941b0f71edae00679a516ee261eda95bd81abda78b
MD5 37460701328be833e99d61fcede8ce06
BLAKE2b-256 f2548afb75d5ac866cff1375e0da9d7fb3fe6ac1eccb56c62b35dfb2f8e11274

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_s390x.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_s390x.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 dd30ba3fc8f907f884d44519c78e452d972d0f05241cfc8e987515743a62753f
MD5 1ecd7edf500c167b19935fce2c6ffa40
BLAKE2b-256 270e308e519c76a6c624217c659e37ceea8d907b2692db9f3833897d4e53497e

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 9bc5f069599c3531d4c487a97b5f8c2c056a17bc4698712bdf73eb843f4a23e0
MD5 91873935ecc672bd93459adfd0d69bb2
BLAKE2b-256 913a335e6c15f13fdbd93a677ab63506fe5a395f61460f4912a5ce50f4f2d18c

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 31381b685a3daa6c3c1c26fa50402c02d0e4dcf017774ff9a59773d3d2b24df0
MD5 b58e3f8efa09d0fd6c9faf36ec715b78
BLAKE2b-256 0874d5c8ee3953be1a19a11d7a7c4e75df6e4cf0fab0bd9caf5b64ff78c752a1

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9108b95c4abf4efd238e1dc658fb357a77855fbe46b6b80b0d2e4b624480d97a
MD5 de7744985ccfa6f3c814b148d5ded999
BLAKE2b-256 ecc5ae0f375657ae794741dc8a5647038c2ad819effbdc45ca33ffd3314a45f6

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 147c5185fef4a6cd7b2991aad61b4921236f028171e4c74612fa7924d30fe240
MD5 7f4d4b143eb078b5e6ff2a5d1c1bd485
BLAKE2b-256 e7ee7cfc9858d593ad666800da8670f4f90f081e3298b62f6c74374d417b38d5

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp37-cp37m-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp37-cp37m-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.7m, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp37-cp37m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 f4594cea6c599a8f08cd55f1df25fbd48a2dde1963692f267ed5a0beea121061
MD5 1fb619f22901998ecee9e4a0b709545b
BLAKE2b-256 fab267967427334adcfc47a720fd0a73c251fa9e52c2940b4d8534dc1571195e

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 6fb1d332f008d3ddfd3a4d757ddc6142064bf2de478ee721e132b006dcf1ac77
MD5 6662afba5b975701bc4226f7fc3c9454
BLAKE2b-256 f8961c073b1cba0ceaa430fddff84cf7814904bb5b4f4af36b1d18a5addc172d

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-win32.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 afe369f91645a6b662882bcc640bf3935b86f880869c042575529807027634be
MD5 bb98084bea3e6189d70a1c6a7bb2703e
BLAKE2b-256 1c70b8fd5e0e5f7921ca90ca988327796dcac3cb093b1bcc86fec7d675426589

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_s390x.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_s390x.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 7deaa04a5b9f770111100393f88fc0bccd195235852266341db20419ffabd9e9
MD5 069ebb90047ba57aeaac364a65e81804
BLAKE2b-256 664660908f4f1da6e604978997d5f8465c680bee78d31e249bcf57a4dba9e86e

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 c9e7f9d67a16a6d393c67f4db2048a2de2b5995ec4f77f68e3bd90fc627c7969
MD5 cde8faae81db4e5d87e6429b06e5eaec
BLAKE2b-256 0b85442f765f1eae5a4458ba727df2c150413d78082458a984fae6fa2c67a515

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a0c5e9ef37610daf11c69ca43baac3e8fa9cd94109958f639613f3686a4aebfa
MD5 bbd7bc348c326ba20ac09267b0bfcc46
BLAKE2b-256 e6844a24494ce3dfc7e60c2f06c2362ccd90dc4ffc4ff34cb86b2155a05debb0

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9499140359e482ab5681ceb8b53c5ec6ca5e280991f172fd258d86daa5213018
MD5 73c8fe75c44bc4ab602295e7fcce8af0
BLAKE2b-256 ab0afcdf6d4d053b676076bf15f882231a9e18c10571dd7ee3373f88a7c2660e

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 218035148a0b8c7d260b091a29ce55afca4268ae449e9c1b24bf9011e5a7f561
MD5 5c3703a5d4a628a8ea9a36b273274b4a
BLAKE2b-256 f1a17109f7730a80be5d63f269024f06d588473472d82b7437caf08a874d8a9a

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp36-cp36m-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp36-cp36m-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.6m, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp36-cp36m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 7daffbe3815014fc0633b6989922955c174c9b3003da123023568c3940541be7
MD5 3fd46595d2e305e00a8a70950db5aca7
BLAKE2b-256 3df8e21da7f4415f10e4b5d8fa438045cd7368f310187864ae1fcec8328428ee

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 eac07620ff8a81e7b4999c3fa4a1d68db6ff8b2d32c6233b946335552f4d1bfb
MD5 9eec5a74b9af8dee5db570532e4b28b3
BLAKE2b-256 124cc7c767d3d8dc90f35c40cf3b9b1f0ac1476148d5471f77f011156d5b719f

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-win32.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 9e7b69525080220e186000b84004dcd127837d04986c6bec8f3dc8f773695faa
MD5 5fd8dd3efcc379c669a9c2a461fae261
BLAKE2b-256 39177e65ff1a2d4cae883f7b7f7b1d9d23f1b7aa5803a8eaa5898c6e64770dcc

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_s390x.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_s390x.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 0c4ea8201860726a5621b72390d3ce032b48366d9469585e91139eeb671fa23e
MD5 cad7de4c05373ec5a6a751011278195c
BLAKE2b-256 7b9150413257669035c0359c6bb6897fa6107696f37578625a8fe6810cea0077

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 34803318688db48bfe07c0d7be6393615ff52f515bafd63013ae0bb1fc10fa53
MD5 277fc3800ffe9deb83661fdde20ba9cf
BLAKE2b-256 991bbaa217eaf30fcbca44a263331c779c2f52fe53d6013d05d87809fb1c1b43

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0323235ef81347edbf0c407bb1ff718c2d905de6451f7208e3aa8fbd29185bba
MD5 3286f8d9752ed504aad478637f17fd41
BLAKE2b-256 1a1fe20790bbd610c55ec29307dea2a4f979db563829cfb09859aca69327db7e

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.5m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 04ca25a96211f9ede3f00e457a2c06b68bcca81aa3a14966fb4e2f433ab85890
MD5 9f2957aaa188e397adc1cf5c757d37a5
BLAKE2b-256 ec78740d189244585b10fd440cf1e6c2f7085754d88329db407f7c73cbdfd0fa

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 30087ec684a420b855fe712da29d0f2b8c43ba0a3ca8641b5981a658d451f655
MD5 eab15e658e6863d5330b59797a925162
BLAKE2b-256 c01ed05f17d5b5ef7c56aa5c6f5190d72a190bd0b60f5444958b8247e1b3e645

See more details on using hashes here.

Provenance

File details

Details for the file tokenizers-0.10.0rc1-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

  • Download URL: tokenizers-0.10.0rc1-cp35-cp35m-macosx_10_11_x86_64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.5m, macOS 10.11+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.0

File hashes

Hashes for tokenizers-0.10.0rc1-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 8ee9af1f91cdb8bf9c6164ddc5ac59e8b73063ba880245c72e82bf77dc23af99
MD5 860c4a5a01832167a90531035ee73e8a
BLAKE2b-256 d0b9ff0f57cabb1791357fc1584109974cce8fbf840667d593befecfc5f58acd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page