Skip to main content

Visualize how LLMs tokenize text

Project description

LLMVision

Visualize how LLMs tokenize text.

from llmvision import tokenize_and_visualize, GPT4Tokenizer

text = "Hello world! 👋🌍"
print(tokenize_and_visualize(text, GPT4Tokenizer()))
# Output: Hello│ world│!│<bytes:20f09f>│<bytes:91>│<bytes:8b>│<bytes:f09f>│<bytes:8c>│<bytes:8d>

Features

  • Multiple tokenizers: GPT-2, GPT-4, byte-level, character-level
  • Visual token boundaries
  • Unicode/emoji handling
  • Actual tokenization used by OpenAI models

Installation

pip install llmvision

Usage

llmvision "Hello world!"
llmvision "Hello world!" --tokenizer gpt4
llmvision "Hello world!" --indices
from llmvision import tokenize_and_visualize, GPT4Tokenizer

# Default tokenizer
print(tokenize_and_visualize("Hello world!"))

# Specific tokenizer
print(tokenize_and_visualize("Hello world!", GPT4Tokenizer()))

Examples

from llmvision import GPT4Tokenizer

tokenizer = GPT4Tokenizer()
tokens = tokenizer.tokenize("Hello world!")
print(tokens)  # ['Hello', ' world', '!']

Tokenizers

  • SimpleTokenizer - word/punctuation/space
  • WordTokenizer - whitespace-based
  • CharTokenizer - character-level
  • GraphemeTokenizer - Unicode grapheme clusters
  • ByteLevelTokenizer - UTF-8 bytes
  • GPT2Tokenizer - GPT-2 (via tiktoken)
  • GPT4Tokenizer - GPT-4 (via tiktoken)
  • SubwordTokenizer - basic subword splitting

Token Costs

tokenizer = GPT4Tokenizer()
examples = [
    "Hello world!",    # 3 tokens
    "Hello 世界!",     # 5 tokens  
    "Hello 👋🌍!",     # 8 tokens
    "👨‍👩‍👧‍👦",            # 18 tokens
]
for text in examples:
    print(f"{text:15}{len(tokenizer.tokenize(text))} tokens")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmvision-0.1.1.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmvision-0.1.1-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file llmvision-0.1.1.tar.gz.

File metadata

  • Download URL: llmvision-0.1.1.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for llmvision-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8eede57c89d8960e2a16cfb65d65a0841a697076f64ad2bbc6288f0a762a990b
MD5 aee95464fc7c0d360b885b857336f462
BLAKE2b-256 0b4bf829e91e1b8e0a5fd4fd416b69df6b0dbe710da62e964f87838621eee731

See more details on using hashes here.

File details

Details for the file llmvision-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llmvision-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for llmvision-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 230d261fa1402e9c63be1cd01d16377c26c59fa1e32366ed1f95a6b921317a2a
MD5 d6f3b553362e0e5be91db143a779bf8c
BLAKE2b-256 987565cc6dbb92346b6c400d19d0873253e0fd3b0e80d32dd658443f5045285c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page