Skip to main content

Extremely fast bert tokenizer

Project description

FlashTokenizer

The world's fastest CPU tokenizer library!

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

FlashTokenizer is a high-performance tokenizer implementation in C++ of the BertTokenizer used for LLM inference. It has the highest speed and accuracy of any tokenizer, such as FlashAttention and FlashInfer, and is 10 times faster than BertTokenizerFast in transformers.

[!NOTE]

Why?

  • We need a tokenizer that is faster, more accurate, and easier to use than Huggingface's BertTokenizerFast. (link1, link2, link3)

  • PaddleNLP's BertTokenizerFast achieves a 1.2x performance improvement by implementing Huggingface's Rust version in C++. However, using it requires installing both the massive PaddlePaddle and PaddleNLP packages.

  • Tensorflow-text's FastBertTokenizer actually demonstrates slower performance in comparison.

  • Microsoft's Blingfire takes over 8 hours to train on custom data and shows relatively lower accuracy.

  • Rapid's cuDF provides a GPU-based BertTokenizer, but it suffers from accuracy issues.

  • Unfortunately, FastBertTokenizer and BertTokenizers developed in C# and cannot be used in Python. (As a side note, I don't know C#, but I believe once something is implemented in C#, it shouldn't have "Fast" in its name.)

  • This is why we developed FlashTokenizer. It can be easily installed via pip and is developed in C++ for straightforward maintenance. Plus, it guarantees extremely fast speeds. We've created an implementation that's faster than Blingfire and easier to use. FlashTokenizer is implemented using the LinMax Tokenizer proposed in Fast WordPiece Tokenization, enabling tokenization in linear time. Finally It supports parallel processing at the C++ level for batch encoding, delivering outstanding speed.

Banner



FlashTokenizer includes the following core features

[!TIP]

  • Implemented in C++17.

    • MacOS: g++(14.2.0) or clang++(16.0.0).
    • Windows: g++(8.1.0)-MinGW64 or, Visual Studio 2019.
    • Ubuntu: g++(11.4.0) or clang++(14.0.0).
  • Equally fast in Python via pybind11.

  • Support for parallel processing at the C++ level using OPENMP.

News

[!IMPORTANT]
[Mar 22 2025]

  • Added DFA to AC Trie.

[Mar 21 2025]

  • Improving Tokenizer Accuracy

[Mar 19 2025]

  • Memory reduction and slight performance improvement by applying LinMaxMatching from Aho–Corasick algorithm.
  • Improved branch pipelining of all functions and force-inline applied.
  • Removed unnecessary operations of WordpieceTokenizer(Backward).
  • Optimizing all functions to operate except for Bloom filter is faster than caching.
  • punctuation, control, and whitespace are defined as constexprs in advance and used as Bloom filters.
  • Reduce unnecessary memory allocation with statistical memory profiling.
  • In ✨FlashTokenizer✨, bert-base-uncased can process 35K texts per second on a single core, with an approximate processing time of 28ns per text.

[Mar 18 2025]

  • Improvements to the accuracy of the BasicTokenizer have improved the overall accuracy and, in particular, produce more accurate results for Unicode input.

[Mar 14 2025]

  • The performance of the WordPieceTokenizer and WordPieceBackwordTokenizer has been improved using Trie, which was introduced in Fast WordPiece Tokenization.
  • Using FastPoolAllocator in std::list improves performance in SingleEncoding, but it is not thread-safe, so std::list<std::string> is used as is in BatchEncoding. In BatchEncoding, OPENMP is completely removed and only std::thread is used.

[Mar 10 2025]

  • Performance improvements through faster token mapping with robin_hood and memory copy minimization with std::list.

Token Ids Map Table Performance Test.

Token and Ids Map used the fastest robin_hood::unordered_flat_map<std::string, int>.

[Mar 09 2025] Completed development of flash-tokenizer for BertTokenizer.

1. Installation

Requirements

  • Windows(AMD64), MacOS(ARM64), Ubuntu(x86-64) .
  • g++ / clang++ / MSVC.
  • python 3.9 ~ 3.12.

Install from PIP

# Windows(Visual Studio)
pip install -U flash-tokenizer
# Ubuntu
sudo apt install gcc g++ make cmake -y
pip install setuptools wheel build pybind11
CC=gcc CXX=g++ pip install -U flash-tokenizer
# MacOS
brew install gcc
CC=gcc CXX=g++ pip install -U flash-tokenizer

Install from Source

git clone https://github.com/NLPOptimize/flash-tokenizer
cd flash-tokenizer
pip install .

2. Sample

from flash_tokenizer import BertTokenizerFlash
from transformers import BertTokenizer

titles = [
    'is there any doubt about it "None whatsoever"',
    "세상 어떤 짐승이 이를 드러내고 사냥을 해? 약한 짐승이나 몸을 부풀리지, 진짜 짐승은 누구보다 침착하지.",
    'そのように二番目に死を偽装して生き残るようになったイタドリがどうして初めて見る自分をこんなに気遣ってくれるのかと尋ねると「私が大切にする人たちがあなたを大切にするから」と答えては'
]

vocab_file = "sample/vocab.txt"

tokenizer1 = BertTokenizerFlash(vocab_file, do_lower_case=False)
tokenizer2 = BertTokenizer(vocab_file, do_lower_case=False)

for title in titles:
    print(title)
    print(tokenizer1.tokenize(title))
    print(tokenizer2.tokenize(title))
    ids1 = tokenizer1(title, max_length=512, padding="longest").input_ids[0]
    ids2 = tokenizer2(title, max_length=512, padding="longest").input_ids
    print(ids1)
    print(ids2)
is there any doubt about it "None whatsoever"
['is', 'there', 'any', 'doubt', 'about', 'it', '"', 'None', 'what', '##so', '##ever', '"']
['is', 'there', 'any', 'doubt', 'about', 'it', '"', 'None', 'what', '##so', '##ever', '"']
[101, 10124, 11155, 11178, 86697, 10978, 10271, 107, 86481, 12976, 11669, 23433, 107, 102]
[101, 10124, 11155, 11178, 86697, 10978, 10271, 107, 86481, 12976, 11669, 23433, 107, 102]

세상 어떤 짐승이 이를 드러내고 사냥을 해? 약한 짐승이나 몸을 부풀리지, 진짜 짐승은 누구보다 침착하지.
['세', '##상', '어떤', '짐', '##승', '##이', '이를', '드', '##러', '##내', '##고', '사', '##냥', '##을', '해', '?', '약', '##한', '짐', '##승', '##이나', '몸', '##을', '부', '##풀', '##리', '##지', ',', '진', '##짜', '짐', '##승', '##은', '누', '##구', '##보다', '침', '##착', '##하지', '.']
['세', '##상', '어떤', '짐', '##승', '##이', '이를', '드', '##러', '##내', '##고', '사', '##냥', '##을', '해', '?', '약', '##한', '짐', '##승', '##이나', '몸', '##을', '부', '##풀', '##리', '##지', ',', '진', '##짜', '짐', '##승', '##은', '누', '##구', '##보다', '침', '##착', '##하지', '.']
[101, 9435, 14871, 55910, 9710, 48210, 10739, 35756, 9113, 30873, 31605, 11664, 9405, 118729, 10622, 9960, 136, 9539, 11102, 9710, 48210, 43739, 9288, 10622, 9365, 119407, 12692, 12508, 117, 9708, 119235, 9710, 48210, 10892, 9032, 17196, 80001, 9783, 119248, 23665, 119, 102]
[101, 9435, 14871, 55910, 9710, 48210, 10739, 35756, 9113, 30873, 31605, 11664, 9405, 118729, 10622, 9960, 136, 9539, 11102, 9710, 48210, 43739, 9288, 10622, 9365, 119407, 12692, 12508, 117, 9708, 119235, 9710, 48210, 10892, 9032, 17196, 80001, 9783, 119248, 23665, 119, 102]

そのように二番目に死を偽装して生き残るようになったイタドリがどうして初めて見る自分をこんなに気遣ってくれるのかと尋ねると「私が大切にする人たちがあなたを大切にするから」と答えては
['その', '##ように', '二', '番', '目', 'に', '死', 'を', '偽', '装', 'して', '生', 'き', '残', 'る', '##ようになった', '##イ', '##タ', '##ド', '##リ', '##が', '##ど', '##う', '##して', '初', 'めて', '見', 'る', '自', '分', 'を', '##こ', '##んな', '##に', '気', '遣', 'って', '##く', '##れる', '##のか', '##と', '尋', 'ね', '##ると', '「', '私', 'が', '大', '切', 'にする', '人', 'たちが', '##あ', '##な', '##た', '##を', '大', '切', 'にする', '##から', '」', 'と', '答', 'えて', '##は']
['その', '##ように', '二', '番', '目', 'に', '死', 'を', '偽', '装', 'して', '生', 'き', '残', 'る', '##ようになった', '##イ', '##タ', '##ド', '##リ', '##が', '##ど', '##う', '##して', '初', 'めて', '見', 'る', '自', '分', 'を', '##こ', '##んな', '##に', '気', '遣', 'って', '##く', '##れる', '##のか', '##と', '尋', 'ね', '##ると', '「', '私', 'が', '大', '切', 'にする', '人', 'たちが', '##あ', '##な', '##た', '##を', '大', '切', 'にする', '##から', '」', 'と', '答', 'えて', '##は']
[101, 11332, 24273, 2150, 5632, 5755, 1943, 4805, 1980, 2371, 7104, 11592, 5600, 1913, 4814, 1975, 27969, 15970, 21462, 15713, 21612, 10898, 56910, 22526, 22267, 2547, 19945, 7143, 1975, 6621, 2534, 1980, 28442, 60907, 11312, 4854, 7770, 14813, 18825, 58174, 75191, 11662, 3456, 1945, 100812, 1890, 5949, 1912, 3197, 2535, 84543, 2179, 78776, 111787, 22946, 20058, 11377, 3197, 2535, 84543, 16867, 1891, 1940, 6076, 27144, 11588, 102]
[101, 11332, 24273, 2150, 5632, 5755, 1943, 4805, 1980, 2371, 7104, 11592, 5600, 1913, 4814, 1975, 27969, 15970, 21462, 15713, 21612, 10898, 56910, 22526, 22267, 2547, 19945, 7143, 1975, 6621, 2534, 1980, 28442, 60907, 11312, 4854, 7770, 14813, 18825, 58174, 75191, 11662, 3456, 1945, 100812, 1890, 5949, 1912, 3197, 2535, 84543, 2179, 78776, 111787, 22946, 20058, 11377, 3197, 2535, 84543, 16867, 1891, 1940, 6076, 27144, 11588, 102]

3. Other Implementations

Most BERT-based models use the WordPiece Tokenizer, whose code can be found here. (A simple implementation of Huggingface can be found here).

Since the BertTokenizer is a CPU intensive algorithm, inference can be a bottleneck, and unoptimized tokenizers can be severely slow. A good example is the BidirectionalWordpieceTokenizer introduced in KR-BERT. Most of the code is the same, but the algorithm traverses the sub token backwards and writes a larger value compared to the forward traversal. The paper claims accuracy improvements, but it's hard to find other quantitative metrics, and the accuracy improvements aren't significant, and the tokenizer is seriously slowed down.

  • transformers (Rust Impl, PyO3)
  • paddlenlp (C++ Impl, pybind)
  • tensorflow-text (C++ Impl, pybind)
  • blingfire (C++ Impl, Native binary call)

Most developers will either use transformers.BertTokenizer or transformers.AutoTokenizer, but using AutoTokenizer will return transformers.BertTokenizerFast.

Naturally, it's faster than BertTokenizer, but the results aren't exactly the same, which means you're already giving up 100% accuracy starting with the tokenizer.

BertTokenizer is not only provided by transformers. PaddleNLP and tensorflow-text also provide BertTokenizer.

Then there's Blingfire, which is developed by Microsoft and is being abandoned.

PaddleNLP requires PaddlePaddle and provides tokenizer functionality starting with version 3.0rc. You can install it as follows

##### Install PaddlePaddle, PaddleNLP
python -m pip install paddlepaddle==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
pip install --upgrade paddlenlp==3.0.0b3
##### Install transformers
pip install transformers==4.47.1
##### Install tf-text
pip install tensorflow-text==2.18.1
##### Install blingfire
pip install blingfire

With the exception of blingfire, vocab.txt is all you need to run the tokenizer right away. (blingfire also requires only vocab.txt and can be used after 8 hours of learning).

The implementations we'll look at in detail are PaddleNLP's BertTokenizerFast and blingfire.

  • blingfire: Uses a Deterministic Finite State Machine (DFSM) to eliminate one linear scan and unnecessary comparisons, resulting in a time of O(n), which is impressive.
    • Advantages: 5-10x faster than other implementations.
    • Disadvantages: Long training time (8 hours) and lower accuracy than other implementations. (+Difficult to get help due to de facto development hiatus).
  • PaddleNLP: As shown in the experiments below, PaddleNLP is always faster than BertTokenizerFast (HF) to the same number of decimal places, and is always faster on any OS, whether X86 or Arm.
    • Advantages: Internal implementation is in C++ Compared to transformers.BertTokenizerFast implemented in Rust, it is 1.2x faster while outputting exactly the same values.
      • You can't specify pt(pytorch tensor) in return_tensors, but this is not a problem.
    • Disadvantages: none, other than the need to install PaddlePaddle and PaddleNLP.

4. Performance test

4.1 Performance test (Single text encoding)

Accuracy is the result of measuring google's BertTokenizerFast as a baseline. If even one of the input_ids is incorrect, the answer is considered incorrect.

FlashTokenizer

FlashTokenizer

Tokenizer Performance Comparison

google-bert/bert-base-cased

Tokenizer Elapsed Time texts Accuracy
BertTokenizerFast(Huggingface) 84.3700s 1,000,000 99.9226%
BertTokenizerFast(PaddleNLP) 75.6551s 1,000,000 99.9226%
FastBertTokenizer(Tensorflow) 219.1259s 1,000,000 99.9160%
Blingfire 13.6183s 1,000,000 99.8991%
FlashBertTokenizer 8.1968s 1,000,000 99.8216%

google-bert/bert-base-uncased

Tokenizer Elapsed Time texts Accuracy
BertTokenizerFast(Huggingface) 91.7882s 1,000,000 99.9326%
BertTokenizerFast(PaddleNLP) 83.6839s 1,000,000 99.9326%
FastBertTokenizer(Tensorflow) 204.2240s 1,000,000 99.1379%
Blingfire 13.2374s 1,000,000 99.8588%
FlashBertTokenizer 7.6313s 1,000,000 99.6884%

google-bert/bert-base-multilingual-cased

Tokenizer Elapsed Time texts Accuracy
BertTokenizerFast(Huggingface) 212.1570s 2,000,000 99.7964%
BertTokenizerFast(PaddleNLP) 193.9921s 2,000,000 99.7964%
FastBertTokenizer(Tensorflow) 394.1574s 2,000,000 99.7892%
Blingfire 38.9013s 2,000,000 99.9780%
FlashBertTokenizer 20.4570s 2,000,000 99.8970%

beomi/kcbert-base

Tokenizer Elapsed Time texts Accuracy
BertTokenizerFast(Huggingface) 52.5744s 1,000,000 99.6754%
BertTokenizerFast(PaddleNLP) 44.8943s 1,000,000 99.6754%
FastBertTokenizer(Tensorflow) 198.0270s 1,000,000 99.6639%
Blingfire 13.0701s 1,000,000 99.9434%
FlashBertTokenizer 5.2601s 1,000,000 99.9484%

microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank

Tokenizer Elapsed Time texts Accuracy
BertTokenizerFast(Huggingface) 208.8858s 2,000,000 99.7964%
BertTokenizerFast(PaddleNLP) 192.6593s 2,000,000 99.7964%
FastBertTokenizer(Tensorflow) 413.2010s 2,000,000 99.7892%
Blingfire 39.3765s 2,000,000 99.9780%
FlashBertTokenizer 22.8820s 2,000,000 99.8970%

KR-BERT

Tokenizer Elapsed Time texts Accuracy
BertTokenizerBidirectional(KR-BERT Original) 128.3320s 1,000,000 100.0000%
FlashBertTokenizer(Bidirectional) 10.4492s 1,000,000 99.9631%
%%{ init: { "er" : { "layoutDirection" : "LR" } } }%%
erDiagram
    Text ||--o{ Preprocess : tokenize
    Preprocess o{--|| Inference : memcpy_h2d
    Inference o{--|| Postprocess : memcpy_d2h

6. Compatibility

FlashBertTokenizer can be used with any framework. CUDA version compatibility for each framework is also important for fast inference of LLMs.

  • PyTorch no longer supports installation using conda.
  • ONNXRUNTIME is separated by CUDA version.
  • PyTorch is also looking to ditch CUDA 12.x in favor of the newer CUDA 12.8. However, the trend is to keep CUDA 11.8 in all frameworks.
    • CUDA 12.x was made for the newest GPUs, Hopper and Blackwell, and on GPUs like Volta, CUDA 11.8 is faster than CUDA 12.x.
DL Framework Version OS CPU CUDA 11.8 CUDA 12.3 CUDA 12.4 CUDA 12.6 CUDA 12.8
PyTorch 2.6 Linux, Windows
PyTorch 2.7 Linux, Windows
ONNXRUNTIME(11) 1.20.x Linux, Windows
ONNXRUNTIME(12) 1.20.x Linux, Windows
PaddlePaddle 3.0-beta Linux, Windows

7. GPU Tokenizer

Here is an example of installing and running cuDF in Run State of the Art NLP Workloads at Scale with RAPIDS, HuggingFace, and Dask. (It's incredibly fast)

You can run WordPiece Tokenizer on GPUs on rapids(cudf).

As you can see in how to install rapids, it only supports Linux and the CUDA version is not the same as other frameworks, so docker is the best choice, which is faster than CPU for batch processing but slower than CPU for streaming processing.

There are good example codes and explanations in the[ blog](https://developer.nvidia.com/blog/run-state-of-the-art-nlp-workloads-at-scale-with-rapids-huggingface-and-dask/#:~:text=,and then used in subsequent). To use cuDF, you must first convert vocab.txt to hash_vocab as shown below. The problem is that the hash_vocab function cannot convert multilingual. Therefore, the WordpieceTokenizer of cuDF cannot be used if there are any characters other than English/Chinese in the vocab.

import cudf
from cudf.utils.hash_vocab_utils import hash_vocab
hash_vocab('bert-base-cased-vocab.txt', 'voc_hash.txt')

TODO

Acknowledgement

FlashTokenizer is inspired by FlashAttention, FlashInfer, FastBertTokenizer and tokenizers-cpp projects.

Performance comparison

Star History

Star History Chart

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flash_tokenizer-1.1.8.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

flash_tokenizer-1.1.8-cp313-cp313-win_amd64.whl (198.4 kB view details)

Uploaded CPython 3.13Windows x86-64

flash_tokenizer-1.1.8-cp313-cp313-manylinux_2_28_x86_64.whl (636.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp313-cp313-macosx_15_0_arm64.whl (199.6 kB view details)

Uploaded CPython 3.13macOS 15.0+ ARM64

flash_tokenizer-1.1.8-cp312-cp312-win_amd64.whl (198.5 kB view details)

Uploaded CPython 3.12Windows x86-64

flash_tokenizer-1.1.8-cp312-cp312-manylinux_2_28_x86_64.whl (636.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp312-cp312-macosx_15_0_arm64.whl (199.5 kB view details)

Uploaded CPython 3.12macOS 15.0+ ARM64

flash_tokenizer-1.1.8-cp311-cp311-win_amd64.whl (197.7 kB view details)

Uploaded CPython 3.11Windows x86-64

flash_tokenizer-1.1.8-cp311-cp311-manylinux_2_28_x86_64.whl (635.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp311-cp311-macosx_15_0_arm64.whl (200.3 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

flash_tokenizer-1.1.8-cp310-cp310-win_amd64.whl (196.9 kB view details)

Uploaded CPython 3.10Windows x86-64

flash_tokenizer-1.1.8-cp310-cp310-manylinux_2_28_x86_64.whl (633.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp310-cp310-macosx_15_0_arm64.whl (198.7 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

flash_tokenizer-1.1.8-cp39-cp39-win_amd64.whl (198.0 kB view details)

Uploaded CPython 3.9Windows x86-64

flash_tokenizer-1.1.8-cp39-cp39-manylinux_2_28_x86_64.whl (633.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp39-cp39-macosx_15_0_arm64.whl (199.0 kB view details)

Uploaded CPython 3.9macOS 15.0+ ARM64

flash_tokenizer-1.1.8-cp38-cp38-win_amd64.whl (196.8 kB view details)

Uploaded CPython 3.8Windows x86-64

flash_tokenizer-1.1.8-cp38-cp38-manylinux_2_28_x86_64.whl (633.6 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

flash_tokenizer-1.1.8-cp38-cp38-macosx_15_0_arm64.whl (198.6 kB view details)

Uploaded CPython 3.8macOS 15.0+ ARM64

File details

Details for the file flash_tokenizer-1.1.8.tar.gz.

File metadata

  • Download URL: flash_tokenizer-1.1.8.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for flash_tokenizer-1.1.8.tar.gz
Algorithm Hash digest
SHA256 d4414aa4ed3afbc3593f3fd353c79c68eedb52025c8d0ca413ebf5a076c8d734
MD5 6dd878a4fffbba1ab27755bb26c3b9db
BLAKE2b-256 92010ba8723177e37cb09a0abc69fb52df414a30aa8116519344b84e6779f4ab

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ab94bcf67a521c9a04023e54651173af8f784658098a73e6e0aacd4ad8d4ffa8
MD5 edb82f25ba3ee0e1b021d3be373697d6
BLAKE2b-256 0eafd85c6b34a4b6b835d729cbac5b2ea4614c26880b27ad93d4f8f71005eec6

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4805b03296e229f902277441f1bf2e9aca0211940070d017688c78cd2ebfc56d
MD5 d928d006b5dd69b4c0c6292f0e29ebeb
BLAKE2b-256 eebee7e01d642b163c2b9dd168a67e674ae61251180013515cb63bb8286932f2

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp313-cp313-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp313-cp313-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 6f8e4bdcc9339db26361e63a84e310cfbb6ea621fa9bf4651b8d064808d758f2
MD5 1dc56b6db26a2606be8851008c48882a
BLAKE2b-256 07cd07fc65aeefde0ce9dea88d44f18c10bb35240a1b3ad428921054c1ebff34

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 cd5a8898de9dde4be869a8e0ccc9ac0f6fc795862caee2f166f104627d2ae41a
MD5 87aea3f46bd2cc6195a0184374d31a33
BLAKE2b-256 e2d84a38803da20e19cd86080fb9ea7fe5fd499dd74683da61a2fc28bd8de28d

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ffa921a85d933a6a482d4e61f986a3e0beaabae593ba713578ec748157d87df2
MD5 a62aef2c14bbee28ee933b735b35f118
BLAKE2b-256 3c597abe092e8ee7b10c117133b39e7d7d8f8ca313f4b41efe48eb0dbb7b427b

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp312-cp312-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp312-cp312-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 7898733aa9b0568f67ac3eb5f9eb756628cf39f4bb3bd23536fa5074fd863ab8
MD5 f20630611d110295dbdf436ccb20883b
BLAKE2b-256 d3cece751a7028d117318e4325b7827ec43717d52c0a37e93613223748c9f3e5

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b20954bce74f4defa4809b27011ca54c80882c36df676b75584b6add57884c08
MD5 34c7b39b30dcfd185c8ec86e63e11687
BLAKE2b-256 92653fbc3cc5e4c43db1247809346519cfc5626da48206a268ca9dfbc7dcd3bf

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 018cd086fab4e017c4fe4d66964ccc61f7c73636bba640ebeee0063dfa77dc8c
MD5 e794c677276044422eb2c4320c97fc5f
BLAKE2b-256 22f67884049c81fcffbd9ca36842a448056424d47d9a0f347ca7e179cdb70d70

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 e11b5a821bf35b0d5af53d4c6a5d64ee7c50ffc50aeabb9caa2b7cfb9bfe3bd7
MD5 55eaed8396308adc0869c824814fc020
BLAKE2b-256 9b0ac2f138bd5c9bdaa131611e34394623a9df332e23dad7faa3d1cb8ba31eba

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 35fdeb7228ce4dfbfdb20fad0a6f3e0c741f2ca96c091076160049bcd377f170
MD5 a90c8877c521fbea31b394e0d9e15396
BLAKE2b-256 5bf0aee315ce77fd20205b128795e5e7cf9bfdb190625c99d7e59938f549446b

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cf019a4d5fa1c232ace289c85bd612c09f9fa89427b8421461c42e1b1f1808c8
MD5 cf56f991f2493737c250de3c018054d3
BLAKE2b-256 4d2dac5789a9e0d29cc140f466742e4483ff5d402fe08c8dbdfd98e8e2e89849

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 62a831c71ab52650feb0c7ae2552d583f813cd8ad90844982b20d39e78d24f38
MD5 2c05d46838297f235282532f237f124c
BLAKE2b-256 8491fe35f857e43d023b3654d756a3604aa5c2a3e184e08726b2451633d85247

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 8e3c2fbf1d86ccb5f7573fad4d1cfd5ae0774ed78c021e1caa420421570f9a7c
MD5 4e85919bc4af882e1374b7b392dc9234
BLAKE2b-256 82e88345ed772190dcb97b448da2b93e0ae91568f05fd4d5cf26b3d282d9fa73

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d81f06b5a3a3a7dfc9bb73b67aea340d09a9991310181af52cdd616f5269dbe5
MD5 1487a80c804efe6db206eac8bb73d4c2
BLAKE2b-256 adb220307ce77559891663ccc27f9606b5d49d37ae9441120ff1bd21e2310fab

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp39-cp39-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp39-cp39-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 4ac817a0e39379088416c1870b5f0ed714a093da9ed3026f2344bc4531fe3e2c
MD5 4220e4d6fe7025108a6a7b16424a6079
BLAKE2b-256 bc4773536c8f80821322ef58f12f4c125d95adecd467501ec14c7647f503dc31

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 72423564c1611f47107524b95513250c07acec18c131d5a53562d37b9c77c942
MD5 c42e55e46919710cc3a82a40461629a9
BLAKE2b-256 204c03802a4e4a8633673aa63fb90d14bc2856ad44a14f0ec6ca362de3dc9368

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 76ec85cb65312454cee933bfe91cf4394a93a773aa29429687cb9ffc92419c64
MD5 5c9e341e3687ca378119b95092c38369
BLAKE2b-256 49342aafad1cb254c8a19b091b3c3053990cd2ecc5275b8328d08ea8f0692c67

See more details on using hashes here.

File details

Details for the file flash_tokenizer-1.1.8-cp38-cp38-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for flash_tokenizer-1.1.8-cp38-cp38-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d8a9988fda9a844ce14829ee2e2cb94f017c6f53e93795f13da75e277842c61a
MD5 47555bc08be79100a1a0df222efb7ac0
BLAKE2b-256 99a8bb4862c2d8b1b647bafd44d57935387f56c57a0085afdfe0d5c20257041f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page