A command-line tool to visualize tokenized text.
Project description
Token CLI
A command-line tool to visualize tokenized text using different encodings.
Installation
- Clone this repository:
git clone https://github.com/taha-yassine/token-cli.git cd token-cli
- Install dependencies (preferably in a virtual environment):
# Using pip python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate` pip install -r requirements.txt # Or using uv uv venv uv pip install -r requirements.txt
Usage
python main.py [OPTIONS] [INPUT_FILE]
From standard input:
echo "This is sample text." | python main.py --hide-stats
From a file:
python main.py --tokenizer gpt-4 your_text_file.txt
Interactive Preview:
Use the -p or --preview-files flag to launch an interactive fzf session. This allows you to browse files in the current directory and its subdirectories, showing a live preview of the tokenization.
python main.py -p
# Or with a specific tokenizer/mode for the previews
python main.py --tokenizer cl100k_base --mode text -p
Options
usage: main.py [-h] [--tokenizer TOKENIZER] [--mode {text,highlight}]
[--hide-text] [--hide-stats] [--force-terminal] [-p]
[input_file]
Visualize tokenized text.
positional arguments:
input_file Path to the input text file. Reads from stdin if not
provided.
options:
-h, --help show this help message and exit
--tokenizer TOKENIZER
Tokenizer to use for tokenization. Possible values:
gpt-4o, o200k_base, cl100k_base, p50k_base, p50k_edit,
r50k_base, gpt2 (default: o200k_base)
--mode {text,highlight}
Mode for displaying tokens: 'text' or 'highlight'.
(default: highlight)
--hide-text Hide the tokenized text.
--hide-stats Hide the token and character counts at the end.
--force-terminal Force terminal output.
-p, --preview-files Use fzf to preview tokenization of files in the
current directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file token_cli-0.1.0.tar.gz.
File metadata
- Download URL: token_cli-0.1.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64e2c6fdb485af89d4b508d2ef27e13db4bb53c9b7c0c5f67f8d3d1d6b6ac2a6
|
|
| MD5 |
d461b2374932ca6cf882ee095ef23521
|
|
| BLAKE2b-256 |
60a63c5cf358c1dd398b56f11cd752bfd68ebc50b6ad1dd24847a1c78f9461b0
|
File details
Details for the file token_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: token_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
88d31e7ace7084889e0ee25b46e37e92a51101b5a98b68eee3d56dbff30a90fd
|
|
| MD5 |
245826c5210c1caf0780efbfcbbbcf46
|
|
| BLAKE2b-256 |
06e562f28c83288c44282a19f5dc5cffe01a87da3df98eaa767ebe311995862b
|