Visualize how LLMs tokenize text
Project description
LLMVision
Visualize how LLMs tokenize text.
from llmvision import tokenize_and_visualize, GPT4Tokenizer
text = "Hello world! 👋🌍"
print(tokenize_and_visualize(text, GPT4Tokenizer()))
# Output: Hello│ world│!│<bytes:20f09f>│<bytes:91>│<bytes:8b>│<bytes:f09f>│<bytes:8c>│<bytes:8d>
Features
- Multiple tokenizers: GPT-2, GPT-4, byte-level, character-level
- Visual token boundaries
- Unicode/emoji handling
- Actual tokenization used by OpenAI models
Installation
pip install llmvision
Usage
llmvision "Hello world!"
llmvision "Hello world!" --tokenizer gpt4
llmvision "Hello world!" --indices
from llmvision import tokenize_and_visualize, GPT4Tokenizer
# Default tokenizer
print(tokenize_and_visualize("Hello world!"))
# Specific tokenizer
print(tokenize_and_visualize("Hello world!", GPT4Tokenizer()))
Examples
from llmvision import GPT4Tokenizer
tokenizer = GPT4Tokenizer()
tokens = tokenizer.tokenize("Hello world!")
print(tokens) # ['Hello', ' world', '!']
Tokenizers
SimpleTokenizer- word/punctuation/spaceWordTokenizer- whitespace-basedCharTokenizer- character-levelGraphemeTokenizer- Unicode grapheme clustersByteLevelTokenizer- UTF-8 bytesGPT2Tokenizer- GPT-2 (via tiktoken)GPT4Tokenizer- GPT-4 (via tiktoken)SubwordTokenizer- basic subword splitting
Token Costs
tokenizer = GPT4Tokenizer()
examples = [
"Hello world!", # 3 tokens
"Hello 世界!", # 5 tokens
"Hello 👋🌍!", # 8 tokens
"👨👩👧👦", # 18 tokens
]
for text in examples:
print(f"{text:15} → {len(tokenizer.tokenize(text))} tokens")
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llmvision-0.1.1.tar.gz
(3.6 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
llmvision-0.1.1-py3-none-any.whl
(10.4 kB
view details)
File details
Details for the file llmvision-0.1.1.tar.gz.
File metadata
- Download URL: llmvision-0.1.1.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8eede57c89d8960e2a16cfb65d65a0841a697076f64ad2bbc6288f0a762a990b
|
|
| MD5 |
aee95464fc7c0d360b885b857336f462
|
|
| BLAKE2b-256 |
0b4bf829e91e1b8e0a5fd4fd416b69df6b0dbe710da62e964f87838621eee731
|
File details
Details for the file llmvision-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llmvision-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
230d261fa1402e9c63be1cd01d16377c26c59fa1e32366ed1f95a6b921317a2a
|
|
| MD5 |
d6f3b553362e0e5be91db143a779bf8c
|
|
| BLAKE2b-256 |
987565cc6dbb92346b6c400d19d0873253e0fd3b0e80d32dd658443f5045285c
|