Skip to main content

A command-line tool to visualize tokenized text.

Project description

Token CLI

A command-line tool to visualize tokenized text using different encodings.

Demo

Installation

  1. Clone this repository:
    git clone https://github.com/taha-yassine/token-cli.git
    cd token-cli
    
  2. Install dependencies (preferably in a virtual environment):
    # Using pip
    python -m venv .venv
    source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
    pip install -r requirements.txt
    
    # Or using uv
    uv venv
    uv pip install -r requirements.txt
    

Usage

python main.py [OPTIONS] [INPUT_FILE]

From standard input:

echo "This is sample text." | python main.py --hide-stats

From a file:

python main.py --tokenizer gpt-4 your_text_file.txt

Interactive Preview:

Use the -p or --preview-files flag to launch an interactive fzf session. This allows you to browse files in the current directory and its subdirectories, showing a live preview of the tokenization.

python main.py -p
# Or with a specific tokenizer/mode for the previews
python main.py --tokenizer cl100k_base --mode text -p

Options

usage: main.py [-h] [--tokenizer TOKENIZER] [--mode {text,highlight}]
               [--hide-text] [--hide-stats] [--force-terminal] [-p]
               [input_file]

Visualize tokenized text.

positional arguments:
  input_file            Path to the input text file. Reads from stdin if not
                        provided.

options:
  -h, --help            show this help message and exit
  --tokenizer TOKENIZER
                        Tokenizer to use for tokenization. Possible values:
                        gpt-4o, o200k_base, cl100k_base, p50k_base, p50k_edit,
                        r50k_base, gpt2 (default: o200k_base)
  --mode {text,highlight}
                        Mode for displaying tokens: 'text' or 'highlight'.
                        (default: highlight)
  --hide-text           Hide the tokenized text.
  --hide-stats          Hide the token and character counts at the end.
  --force-terminal      Force terminal output.
  -p, --preview-files   Use fzf to preview tokenization of files in the
                        current directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

token_cli-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

token_cli-0.1.0-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file token_cli-0.1.0.tar.gz.

File metadata

  • Download URL: token_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for token_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 64e2c6fdb485af89d4b508d2ef27e13db4bb53c9b7c0c5f67f8d3d1d6b6ac2a6
MD5 d461b2374932ca6cf882ee095ef23521
BLAKE2b-256 60a63c5cf358c1dd398b56f11cd752bfd68ebc50b6ad1dd24847a1c78f9461b0

See more details on using hashes here.

File details

Details for the file token_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: token_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for token_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88d31e7ace7084889e0ee25b46e37e92a51101b5a98b68eee3d56dbff30a90fd
MD5 245826c5210c1caf0780efbfcbbbcf46
BLAKE2b-256 06e562f28c83288c44282a19f5dc5cffe01a87da3df98eaa767ebe311995862b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page