Convert tokenizers into OpenVINO models
Project description
OpenVINO Tokenizers
OpenVINO Tokenizers adds text processing operations to OpenVINO.
Features
- Perform tokenization and detokenization without third-party dependencies
- Convert a HuggingFace tokenizer into OpenVINO model tokenizer and detokenizer
- Combine OpenVINO models into a single model
- Add greedy decoding pipeline to text generation model
Installation
(Recommended) Create and activate virtual env:
python3 -m venv venv
source venv/bin/activate
# or
conda create --name openvino_tokenizer
conda activate openvino_tokenizer
Minimal Installation
Use minimal installation when you have a converted OpenVINO tokenizer:
pip install openvino-tokenizers
# or
conda install -c conda-forge openvino openvino-tokenizers
Convert Tokenizers Installation
If you want to convert HuggingFace tokenizers into OpenVINO tokenizers:
pip install openvino-tokenizers[transformers]
# or
conda install -c conda-forge openvino openvino-tokenizers && pip install transformers[sentencepiece] tiktoken
Build and install from source after OpenVINO installation
source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_contrib.git
cd openvino_contrib/modules/custom_operations/
pip install .[transformers]
Build and install for development
source path/to/installed/openvino/setupvars.sh
git clone https://github.com/openvinotoolkit/openvino_contrib.git
cd openvino_contrib/modules/custom_operations/
pip install -e .[all]
# verify installation by running tests
cd user_ie_extensions/tokenizer/python/tests/
pytest .
C++ Installation
You can use converted tokenizers in C++ pipelines with prebuild binaries.
- Download OpenVINO archive distribution for your OS from here and extract the archive.
- Download OpenVINO Tokenizers prebuild libraries from here. To ensure compatibility first three numbers of OpenVINO Tokenizers version should match OpenVINO version and OS.
- Extract OpenVINO Tokenizers archive into OpenVINO installation directory:
- Windows:
<openvino_dir>\runtime\bin\intel64\Release\
- MacOS_x86:
<openvino_dir>/runtime/lib/intel64/Release/
- MacOS_arm64:
<openvino_dir>/runtime/lib/arm64/Release/
- Linux_x86:
<openvino_dir>/runtime/lib/intel64/
- Linux_arm64:
<openvino_dir>/runtime/lib/aarch64/
- Windows:
After that you can add binary extension in the code with:
core.add_extension("user_ov_extensions.dll")
for Windowscore.add_extension("libuser_ov_extensions.dylib")
for MacOScore.add_extension("libuser_ov_extensions.so")
for Linux
and read
/compile
converted (de)tokenizers models.
Usage
:warning: OpenVINO Tokenizers can be inferred on a CPU
device only.
Convert HuggingFace tokenizer
OpenVINO Tokenizers ships with CLI tool that can convert tokenizers from Huggingface Hub or Huggingface tokenizers saved on disk:
convert_tokenizer codellama/CodeLlama-7b-hf --with-detokenizer -o output_dir
There is also convert_tokenizer
function that can convert tokenizer python object.
import numpy as np
from transformers import AutoTokenizer
from openvino import compile_model, save_model
from openvino_tokenizers import convert_tokenizer
hf_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
ov_tokenizer = convert_tokenizer(hf_tokenizer)
compiled_tokenzier = compile_model(ov_tokenizer)
text_input = ["Test string"]
hf_output = hf_tokenizer(text_input, return_tensors="np")
ov_output = compiled_tokenzier(text_input)
for output_name in hf_output:
print(f"OpenVINO {output_name} = {ov_output[output_name]}")
print(f"HuggingFace {output_name} = {hf_output[output_name]}")
# OpenVINO input_ids = [[ 101 3231 5164 102]]
# HuggingFace input_ids = [[ 101 3231 5164 102]]
# OpenVINO token_type_ids = [[0 0 0 0]]
# HuggingFace token_type_ids = [[0 0 0 0]]
# OpenVINO attention_mask = [[1 1 1 1]]
# HuggingFace attention_mask = [[1 1 1 1]]
# save tokenizer for later use
save_model(ov_tokenizer, "openvino_tokenizer.xml")
loaded_tokenizer = compile_model("openvino_tokenizer.xml")
loaded_ov_output = loaded_tokenizer(text_input)
for output_name in hf_output:
assert np.all(loaded_ov_output[output_name] == ov_output[output_name])
Connect Tokenizer to a Model
To infer and convert the original model, install torch or torch-cpu to the virtual environment.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from openvino import compile_model, convert_model
from openvino_tokenizers import convert_tokenizer, connect_models
checkpoint = "mrm8488/bert-tiny-finetuned-sms-spam-detection"
hf_tokenizer = AutoTokenizer.from_pretrained(checkpoint)
hf_model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
text_input = ["Free money!!!"]
hf_input = hf_tokenizer(text_input, return_tensors="pt")
hf_output = hf_model(**hf_input)
ov_tokenizer = convert_tokenizer(hf_tokenizer)
ov_model = convert_model(hf_model, example_input=hf_input.data)
combined_model = connect_models(ov_tokenizer, ov_model)
compiled_combined_model = compile_model(combined_model)
openvino_output = compiled_combined_model(text_input)
print(f"OpenVINO logits: {openvino_output['logits']}")
# OpenVINO logits: [[ 1.2007061 -1.4698029]]
print(f"HuggingFace logits {hf_output.logits}")
# HuggingFace logits tensor([[ 1.2007, -1.4698]], grad_fn=<AddmmBackward0>)
Use Extension With Converted (De)Tokenizer or Model With (De)Tokenizer
Import openvino_tokenizers
will add all tokenizer-related operations to OpenVINO,
after which you can work with saved tokenizers and detokenizers.
import numpy as np
import openvino_tokenizers
from openvino import Core
core = Core()
# detokenizer from codellama sentencepiece model
compiled_detokenizer = core.compile_model("detokenizer.xml")
token_ids = np.random.randint(100, 1000, size=(3, 5))
openvino_output = compiled_detokenizer(token_ids)
print(openvino_output["string_output"])
# ['sc�ouition�', 'intvenord hasient', 'g shouldwer M more']
Text generation pipeline
import numpy as np
from openvino import compile_model, convert_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from openvino_tokenizers import add_greedy_decoding, convert_tokenizer
# Use different repo for the tokenizer because the original repo doesn't have .model file
# Sentencepiece(Unigram) tokenizer supported only with .model file
tokenizer_checkpoint = "microsoft/Llama2-7b-WhoIsHarryPotter"
model_checkpoint = "nickypro/tinyllama-15M"
hf_tokenizer = AutoTokenizer.from_pretrained(tokenizer_checkpoint)
hf_model = AutoModelForCausalLM.from_pretrained(model_checkpoint, use_cache=False)
# convert hf tokenizer
text_input = ["Quick brown fox was"]
ov_tokenizer, ov_detokenizer = convert_tokenizer(hf_tokenizer, with_detokenizer=True)
compiled_tokenizer = compile_model(ov_tokenizer)
# transform input text into tokens
ov_input = compiled_tokenizer(text_input)
hf_input = hf_tokenizer(text_input, return_tensors="pt")
# convert Pytorch model to OpenVINO IR and add greedy decoding pipeline to it
ov_model = convert_model(hf_model, example_input=hf_input.data)
ov_model_with_greedy_decoding = add_greedy_decoding(ov_model)
compiled_model = compile_model(ov_model_with_greedy_decoding)
# generate new tokens
new_tokens_size = 10
prompt_size = ov_input["input_ids"].shape[-1]
input_dict = {
output.any_name: np.hstack([tensor, np.zeros(shape=(1, new_tokens_size), dtype=np.int_)])
for output, tensor in ov_input.items()
}
for idx in range(prompt_size, prompt_size + new_tokens_size):
output = compiled_model(input_dict)["token_ids"]
input_dict["input_ids"][:, idx] = output[:, idx - 1]
input_dict["attention_mask"][:, idx] = 1
ov_token_ids = input_dict["input_ids"]
hf_token_ids = hf_model.generate(
**hf_input,
min_new_tokens=new_tokens_size,
max_new_tokens=new_tokens_size,
temperature=0, # greedy decoding
)
# decode model output
compiled_detokenizer = compile_model(ov_detokenizer)
ov_output = compiled_detokenizer(ov_token_ids)["string_output"]
hf_output = hf_tokenizer.batch_decode(hf_token_ids, skip_special_tokens=True)
print(f"OpenVINO output string: `{ov_output}`")
# OpenVINO output string: `['Quick brown fox was walking through the forest. He was looking for something']`
print(f"HuggingFace output string: `{hf_output}`")
# HuggingFace output string: `['Quick brown fox was walking through the forest. He was looking for something']`
Supported Tokenizer Types
Huggingface Tokenizer Type |
Tokenizer Model Type | Tokenizer | Detokenizer |
---|---|---|---|
Fast | WordPiece | ✅ | ❌ |
BPE | ✅ | ✅ | |
Unigram | ❌ | ❌ | |
Legacy | SentencePiece .model | ✅ | ✅ |
Custom | tiktoken | ✅ | ✅ |
Test Results
This report is autogenerated and includes tokenizers and detokenizers tests. The Output Matched, %
column shows the percent of test strings for which the results of OpenVINO and Hugingface Tokenizers are the same. To update the report run pytest tokenizers_test.py --update_readme
in modules/custom_operations/user_ie_extensions/tokenizer/python/tests
directory.
Output Match by Tokenizer Type
Tokenizer Type | Output Matched, % | Number of Tests |
---|---|---|
BPE | 95.82 | 3420 |
SentencePiece | 86.28 | 2880 |
Tiktoken | 97.69 | 216 |
WordPiece | 82.12 | 520 |
Output Match by Model
Tokenizer Type | Model | Output Matched, % | Number of Tests |
---|---|---|---|
BPE | EleutherAI/gpt-j-6b | 98.33 | 180 |
BPE | EleutherAI/gpt-neo-125m | 98.33 | 180 |
BPE | EleutherAI/gpt-neox-20b | 97.78 | 180 |
BPE | EleutherAI/pythia-12b-deduped | 97.78 | 180 |
BPE | KoboldAI/fairseq-dense-13B | 98.89 | 180 |
BPE | Salesforce/codegen-16B-multi | 97.22 | 180 |
BPE | ai-forever/rugpt3large_based_on_gpt2 | 97.78 | 180 |
BPE | bigscience/bloom | 99.44 | 180 |
BPE | databricks/dolly-v2-3b | 97.78 | 180 |
BPE | facebook/bart-large-mnli | 97.22 | 180 |
BPE | facebook/galactica-120b | 98.33 | 180 |
BPE | facebook/opt-66b | 98.89 | 180 |
BPE | gpt2 | 97.22 | 180 |
BPE | laion/CLIP-ViT-bigG-14-laion2B-39B-b160k | 61.11 | 180 |
BPE | microsoft/deberta-base | 96.11 | 180 |
BPE | roberta-base | 96.11 | 180 |
BPE | sentence-transformers/all-roberta-large-v1 | 96.11 | 180 |
BPE | stabilityai/stablecode-completion-alpha-3b-4k | 98.33 | 180 |
BPE | stabilityai/stablelm-tuned-alpha-7b | 97.78 | 180 |
SentencePiece | NousResearch/Llama-2-13b-hf | 100.00 | 180 |
SentencePiece | NousResearch/Llama-2-13b-hf_slow | 100.00 | 180 |
SentencePiece | THUDM/chatglm2-6b | 100.00 | 180 |
SentencePiece | THUDM/chatglm2-6b_slow | 100.00 | 180 |
SentencePiece | THUDM/chatglm3-6b | 100.00 | 180 |
SentencePiece | THUDM/chatglm3-6b_slow | 100.00 | 180 |
SentencePiece | camembert-base | 0.00 | 180 |
SentencePiece | camembert-base_slow | 75.00 | 180 |
SentencePiece | codellama/CodeLlama-7b-hf | 100.00 | 180 |
SentencePiece | codellama/CodeLlama-7b-hf_slow | 100.00 | 180 |
SentencePiece | microsoft/deberta-v3-base | 93.33 | 180 |
SentencePiece | microsoft/deberta-v3-base_slow | 100.00 | 180 |
SentencePiece | xlm-roberta-base | 98.89 | 180 |
SentencePiece | xlm-roberta-base_slow | 98.89 | 180 |
SentencePiece | xlnet-base-cased | 61.11 | 180 |
SentencePiece | xlnet-base-cased_slow | 53.33 | 180 |
Tiktoken | Qwen/Qwen-14B-Chat | 98.15 | 108 |
Tiktoken | Salesforce/xgen-7b-8k-base | 97.22 | 108 |
WordPiece | ProsusAI/finbert | 80.00 | 40 |
WordPiece | bert-base-multilingual-cased | 80.00 | 40 |
WordPiece | bert-large-cased | 80.00 | 40 |
WordPiece | cointegrated/rubert-tiny2 | 80.00 | 40 |
WordPiece | distilbert-base-uncased-finetuned-sst-2-english | 80.00 | 40 |
WordPiece | google/electra-base-discriminator | 80.00 | 40 |
WordPiece | google/mobilebert-uncased | 95.00 | 40 |
WordPiece | jhgan/ko-sbert-sts | 75.00 | 40 |
WordPiece | prajjwal1/bert-mini | 95.00 | 40 |
WordPiece | rajiv003/ernie-finetuned-qqp | 95.00 | 40 |
WordPiece | rasa/LaBSE | 72.50 | 40 |
WordPiece | sentence-transformers/all-MiniLM-L6-v2 | 75.00 | 40 |
WordPiece | squeezebert/squeezebert-uncased | 80.00 | 40 |
Recreating Tokenizers From Tests
In some tokenizers, you need to select certain settings so that their output is closer to the Huggingface tokenizers:
THUDM/chatglm2-6b
detokenizer always skips special tokens. Useskip_special_tokens=True
during conversionTHUDM/chatglm3-6b
detokenizer don't skips special tokens. Useskip_special_tokens=False
during conversion- All tested tiktoken based detokenizers leave extra spaces. Use
clean_up_tokenization_spaces=False
during conversion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for openvino_tokenizers-2023.3.0.0-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 832c5b7bc25332531fa365ace90781d944073a38c54866b85b5dbfcd11371fa8 |
|
MD5 | 987ae1bcaf143918fd1a266c9a48aee3 |
|
BLAKE2b-256 | 8a0a3b130e0e8ece49d0c0b77159f44be45c95e4cf3dc53331fbb8135a272723 |
Hashes for openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_27_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff5fdc3582f75b5e7af036b1a4390a08012fd3efac7fe2ddcfb54a1a0be92cd1 |
|
MD5 | 10b56938c54b92a60a240e883037f992 |
|
BLAKE2b-256 | 06146bd27ccff109d48791431afa001af1713c0e6ed1417b587b3600d1435a82 |
Hashes for openvino_tokenizers-2023.3.0.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 488955e8a3a057b26e77015c3b77beae9bd590adb2e44cd9bf536518499a8925 |
|
MD5 | 3e312746f92d37297eb651255cdbfad1 |
|
BLAKE2b-256 | 552568302a54097d1008bb4995a93448d3bf67dea818f35303ce98fecf8a9d87 |
Hashes for openvino_tokenizers-2023.3.0.0-py3-none-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43fb7ed01c98952a6c050d6ca8865809baaba0b1f79826953616f968c84021a4 |
|
MD5 | d4d5935afbd4ddbbe45cec4e86b57b51 |
|
BLAKE2b-256 | fac1e97b9171c4656c3fd200f9337d267bc1069ddfb1ad25c6ce30f6850d117c |
Hashes for openvino_tokenizers-2023.3.0.0-py3-none-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7ee7c000f74f0d842013d343887f37d8a0eb1402cb4ee73890273fd8bb7d5bc |
|
MD5 | d2d80bcd6620c15f6d7fa53e2515f2db |
|
BLAKE2b-256 | 58071a765eb9d2011d5acd409c21f01f05de944f51699ce08ba7effa21aa9f6d |