Tokker: a fast local-first CLI tokenizer with all the best models in one place
Project description
Tokker
Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place.
Features
- Simple Usage: Just
tok 'your text'- that's it! - Models:
- OpenAI: GPT-OSS, o-family (o1, o3, o4), GPT-4o, GPT-4, GPT-3.5, GPT-3
- Google: the entire Gemini family
- HuggingFace: popular models like Deepseek-R1, Qwen-3, GLM-4.5 and many other within transformers library (some may not be supported yet)
- Output Formats: color (like this), count, JSON, pivot, and more
- Text Analysis: Token count, word count, character count, and token frequency
- Model History: See your recently used models
- Local-first: Works locally on device (except Google)
Installation
# Install tokker without model provider packages (optional)
pip install tokker
# Install at least one model provider package:
pip install 'tokker[all]' # for all models at once
pip install 'tokker[tiktoken]' # for models from OpenAI
pip install 'tokker[google-genai]' # for models from Google
pip install 'tokker[transformers]' # for models from HuggingFace
Command Reference
usage: tok [--help] [-w MODEL] [-o {color,count,json,pivot,del}] [-m] [-c]
[-dm MODEL] [-do OUTPUT] [-h] [-x]
[text]
Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place
positional arguments:
text text to tokenize (or read from stdin)
options:
--help (or just `tok`) to show this help message
-w, --with MODEL with specific (non-default) model
-o, --output {color,count,json,pivot,del}
output format
-m, --models list all models
-c, --config show config with settings
-dm, --default-model MODEL
set default model
-do, --default-output OUTPUT
set default output
-h, --history show history of used models
-x, --history-clear clear history
Usage
Tokenization
When using bash or zsh, wrap input text in single quotes ('like this') to avoid conflicts with special characters like !.
# Tokenize with default model (o200k_base) and output (color)
$ tok 'Hello world!'
# Get pivot summary of token frequencies
$ tok 'Hello world!' -o pivot
# Tokenize with Deepseek-R1
$ tok 'Hello world!' -w zai-org/GLM-4.5
# Get just the count with Gemini-2.5-pro
$ tok 'Hello world!' -w gemini-2.5-pro -o count
Pipeline Usage
# Process files
$ cat document.txt | tok -w deepseek-ai/DeepSeek-R1 -o count
# Chain with other tools
$ curl -s https://example.com | tok -w openai/gpt-oss-120b
# Compare models
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.5-flash
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.0-flash
Models
# List all available models
$ tok -m
Output:
============
OpenAI:
o200k_base - for GPT-OSS, o-family (o1, o3, o4) and GPT-4o
cl100k_base - for GPT-3.5 (late), GPT-4
p50k_base - for GPT-3.5 (early)
p50k_edit - for GPT-3 edit models (text-davinci, code-davinci)
r50k_base - for GPT-3 base models (davinci, curie, babbage, ada)
------------
Google:
gemini-2.5-pro
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.0-flash-lite
gemini-2.0-flash
Auth setup required -> https://github.com/igoakulov/tokker/blob/main/google-auth-guide.md
------------
HuggingFace:
1. Go to -> https://huggingface.co/models?library=transformers
2. Search models within TRANSFORMERS library (some not supported yet)
3. Copy its `USER/MODEL` into your command, for example:
openai/gpt-oss-120b
Qwen/Qwen3-Coder-480B-A35B-Instruct
zai-org/GLM-4.5
deepseek-ai/DeepSeek-R1
facebook/bart-base
google-bert/bert-base-uncased
google/electra-base-discriminator
microsoft/phi-4
============
Config
Stored locally in ~/.config/tokker/config.json.
Show config:
# Show config with settings
$ tok -c
Returns:
{
"default_model": "o200k_base",
"default_output": "color",
"delimiter": "⎮"
}
Set defaults:
# Set a Deepseek-R1 as the default model
$ tok -dm deepseek-ai/DeepSeek-R1
# Set count as the default output
$ tok -do count
History
Stored locally in ~/.config/tokker/history.json.
Show history:
$ tok -h
Returns:
============
History:
gemini-2.5-pro 2025-08-09 19:58
cl100k_base 2025-08-09 19:52
gpt2 2025-08-08 16:23
============
Clear history:
# Does not ask for confirmation
$ tok -x
Output Formats
Color Output (Default)
- Marks each token with an alternating color.
- Color formatting does not render in the example below, but it's like this.
- Color formatting is not preserved when copying the CLI output.
Command:
$ tok 'Hello world!'
Returns:
Hello world!
3 tokens, 2 words, 12 chars
Del (=Delimited) Output
- Preserves visual token separation when you copy (unlike color)
- After pasting, remove "⎮" easily with Find & Replace if needed
- "⎮" (U+23AE VERTICAL LINE EXTENSION) is a rare symbol, and will not interfere with the standard "|" (U+007C VERTICAL LINE)
Command:
$ tok 'Hello world!' -o del
Returns:
Hello⎮ world⎮!
3 tokens, 2 words, 12 chars
Count Output
Command:
$ tok 'Hello world!' -o count
Returns:
{
"token_count": 3,
"word_count": 2,
"char_count": 12
}
Pivot Output
The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).
Command:
tok 'never gonna give you up neve gonna let you down' -o pivot
Returns:
{
" gonna": 2,
" you": 2,
" down": 1,
" give": 1,
" let": 1,
" ne": 1,
" up": 1,
"never": 1,
"ve": 1
}
Full JSON Output
Command:
tok 'Hello world!' -o json
Returns:
{
"delimited_text": "Hello⎮ world⎮!",
"token_strings": ["Hello", " world", "!"],
"token_ids": [9906, 1917, 0],
"token_count": 3,
"word_count": 2,
"char_count": 12
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Issues and pull requests are welcome! Visit the GitHub repository.
Acknowledgments
- OpenAI for the tiktoken library
- HuggingFace for the transformers library
- Google for the Gemini models and APIs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokker-0.3.9.tar.gz.
File metadata
- Download URL: tokker-0.3.9.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb2676268c543bacac5c84ef251beb3ea3eb9b59e3b71678968713d9047e7f83
|
|
| MD5 |
cd625d5405aeaa7b02a06d24ada335b5
|
|
| BLAKE2b-256 |
d8106463416a692ca64b5babc4529adda91a981c159ee39897883d678cd0852c
|
File details
Details for the file tokker-0.3.9-py3-none-any.whl.
File metadata
- Download URL: tokker-0.3.9-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf2299fe144087f99f59998d6d9ed5a9f6d59b8967dfe722be3dea18f536a831
|
|
| MD5 |
543dded05e1c6fa7dd6f0a9a74c5f691
|
|
| BLAKE2b-256 |
3d3069c6f1aa136bdeb4a6df25d41137ca06e8196b6dbd6cc8c0cf8bcacdda74
|