Skip to main content

Calculate the number of tokens used for images in VLMs

Project description

Vision Token Calculator

A Python tool for calculating the number of tokens generated when processing images with Vision Language Models (VLMs).

Features

  • Calculate image tokens for VLMs
  • Support both existing images and dummy images
  • Support remote images via URL (http/https)
  • Simple command line interface (CLI)

Installation

Option 1: PyPI (recommended)

pip install vt-calc

Option 2: From source (editable for development)

pip install -e .

Usage

Using the vt-calc command (after pip install -e .)

After installing with pip install -e ., you can use the vt-calc command directly:

# Single image
vt-calc --image path/to/your/image.jpg

# Image from URL
vt-calc --image https://example.com/image.jpg

# Directory (batch processing)
vt-calc --image path/to/your/images_dir

# Dummy image with specific dimensions (Width x Height)
vt-calc --size 1920 1080

# Choose a short model name (default: qwen2.5-vl)
vt-calc --image path/to/your/image.jpg -m qwen2.5-vl

# Calculate tokens for a video file
vt-calc --video path/to/video.mp4 -m qwen2.5-vl

# Specify frame sampling rate (FPS)
vt-calc --video video.mp4 --fps 2.0

# Limit maximum number of frames
vt-calc --video video.mp4 --max-frames 100

# Show help
vt-calc --help

CLI options

  • -i, --image: Path to an image file, a directory of images, or an image URL
  • -s, --size WIDTH HEIGHT: Create a dummy image of the given size
  • -m, --model-name: Short model name to use (default: qwen2.5-vl)

Supported input formats for directory processing: .jpg, .jpeg, .png, .webp (case-insensitive).

Example output (single image)

Using dummy image: 1024 x 768
                        ╔══════════════════════════════╗
                        ║ VISION TOKEN ANALYSIS REPORT ║
                        ╚══════════════════════════════╝
╭───────────────────────────────── MODEL INFO ─────────────────────────────────╮
│                                                                              │
│   Model Name                qwen2.5-vl                                       │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────── IMAGE INFO ─────────────────────────────────╮
│                                                                              │
│   Image Source              Dummy image                                      │
│   Original Size (H x W)     1024 x 768                                       │
│   Resized Size (H x W)      1036 x 756                                       │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────── PATCH INFO ─────────────────────────────────╮
│                                                                              │
│   Patch Size (ViT)          14                                               │
│   Grid Size (H x W)         74 x 54                                          │
│   Number of Patches         3996                                             │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────── TOKEN INFO ─────────────────────────────────╮
│                                                                              │
│   Image Token               999                                              │
│   (<|image_pad|>)                                                            │
│   Image Start Token         1                                                │
│   (<|vision_start|>)                                                         │
│   Image End Token           1                                                │
│   (<|vision_end|>)                                                           │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────── TOKEN FORMAT ────────────────────────────────╮
│               <|vision_start|><|image_pad|>*999<|vision_end|>                │
╰──────────────────────────────────────────────────────────────────────────────╯

Example output (multi image)

Processing directory: test_images/
Found 8 images to process...

[1/8] Processing: test_1_640x480.jpg ✓ (393 tokens)
[2/8] Processing: test_2_800x600.jpg ✓ (611 tokens)
[3/8] Processing: test_3_1024x768.jpg ✓ (1001 tokens)
[4/8] Processing: test_4_1280x720.jpg ✓ (1198 tokens)
[5/8] Processing: test_5_1920x1080.jpg ✓ (2693 tokens)
[6/8] Processing: test_6_512x512.jpg ✓ (326 tokens)
[7/8] Processing: test_7_256x256.jpg ✓ (83 tokens)
[8/8] Processing: test_8_2048x1536.jpg ✓ (4017 tokens)

       BATCH ANALYSIS REPORT
╭────────────────────────┬────────────╮
│ Model                  │ qwen2.5-vl │
│ Total Images Processed │ 8          │
│ Average Vision Tokens  │ 1290.2     │
│ Minimum Vision Tokens  │ 83         │
│ Maximum Vision Tokens  │ 4017       │
│ Standard Deviation     │ 1370.5     │
╰────────────────────────┴────────────╯

Supported Models

Model Option
Qwen2-VL qwen2-vl
Qwen2.5-VL qwen2.5-vl
Qwen3-VL qwen3-vl
InternVL3 internvl3
LLaVA llava

License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vt_calc-0.0.3.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vt_calc-0.0.3-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file vt_calc-0.0.3.tar.gz.

File metadata

  • Download URL: vt_calc-0.0.3.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vt_calc-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c94825cf021e668d0d97b7a6d9a9082ab43896b7561347954388663a984f9a6e
MD5 4e30b185d0951b230feeb2cb84e33bff
BLAKE2b-256 b351b38d95bedd66c9d94f5d5f95e115c835962d013b49e791545aabb00c1fe4

See more details on using hashes here.

File details

Details for the file vt_calc-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: vt_calc-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vt_calc-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a7ac7d9239c5bd0f93127d7856adaea7e11c3b1cba2127d637c7453932952d68
MD5 57c3756b83dd39bf1086e4eda0c08c1a
BLAKE2b-256 f5207fb3a0d08e64a8335039d1ceea76795f8e47f93b8aa340addc35e53e9393

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page