FlashBertTokenizer implementation with C++ backend
Project description
Tokenizer Library for LLM Serving
EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING
FlashTokenizer는 LLM 추론시 사용하는 BertTokenizer와 같은 고성능 tokenizer 구현체 입니다. FlashAttention, FlashInfer와 같이 최고의 속도와 정확도를 보여주며 transformers의 BertTokenizerFast보다 4~5배 빠릅니다.
FlashTokenizer는 아래와 같은 핵심 기능이 포함됩니다.
- C++17로 구현되었으며 LLVM으로 빌드할 시 가장 빠릅니다.
- pybind11을 통해 Python에서도 동일하게 빠른 속도를 보여줍니다.
- Blingfire는 정확도가 낮아 실제로 사용하기에 어려웠지만 FlashBertTokenizer는 높은 정확도와 빠른 속도까지 모두 가지고 있습니다.
| Tokenizer | Elapsed Time (s) | titles | Accuracy |
|---|---|---|---|
| BertTokenizer(Huggingface) | 255.651 | 404,464 | 100 (Baseline) |
| FlashBertTokenizer | 19.1325 | 404,464 | 99.3248 |
| BertTokenizerFast(HuggingFace) | 75.8732 | 404,464 | 99.8615 |
| BertTokenizerFast(PaddleNLP) | 71.5387 | 404,464 | 99.8615 |
| FastBertTokenizer(Tensorflow-text) | 82.2638 | 404,464 | 99.8507 |
| Blingfire | 12.7293 | 404,464 | 96.8979 |
FlashInfer는 대규모 언어 모델용 라이브러리이자 커널 생성기로, FlashAttention, SparseAttention, PageAttention, 샘플링 등과 같은 LLM GPU 커널의 고성능 구현을 제공합니다. 플래시인퍼는 LLM 제공 및 추론에 중점을 두고 있으며 다양한 시나리오에서 최첨단 성능을 제공합니다.
FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more. FlashInfer focuses on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios.
| Blog | Documentation | Slack| Discussion Forum |
Flash BERT tokenizer implementation with C++ backend.
Installation
brew install llvm libomp
pip install -U flash-tokenizer
git clone https://github.com/springkim/flash-tokenizer.git
cd flash-tokenizer
pip install .
Usage
from flash_tokenizer import FlashBertTokenizer
tokenizer = FlashBertTokenizer("path/to/vocab.txt", do_lower_case=True)
# Tokenize text
ids = tokenizer("Hello, world!")
print(ids)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flash_tokenizer-0.5.0.tar.gz.
File metadata
- Download URL: flash_tokenizer-0.5.0.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7acdc8143280f255fbd4be5a8a95f33561a0601ae7d0263e862f8e1bc8099742
|
|
| MD5 |
42567b06d0830bacc847ba2517ed52f1
|
|
| BLAKE2b-256 |
3cc6bd8c21a3fb75b40ea24d7507c173b5d673b82f1fecc15936887931674d61
|
File details
Details for the file flash_tokenizer-0.5.0-cp312-cp312-macosx_15_0_arm64.whl.
File metadata
- Download URL: flash_tokenizer-0.5.0-cp312-cp312-macosx_15_0_arm64.whl
- Upload date:
- Size: 82.5 kB
- Tags: CPython 3.12, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e17bf9a4d07679d63e517124258ca77873a077fca49e3b859395a568e942dd58
|
|
| MD5 |
499d42d11d636b4f0c8e6ff92d2703f4
|
|
| BLAKE2b-256 |
091bf13ea341555da67bf03a4d49546aa658fbce795abf086f9b5eb5ea9592bc
|