Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place
Project description
Tokker
Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place.
Features
- Simple Usage: Just
tok 'your text'- that's it! - Models:
- OpenAI: GPT-3, GPT-3.5, GPT-4, GPT-4o, o-family (o1, o3, o4)
- Google: the entire Gemini family
- HuggingFace: select literally any model that supports
transformerslibrary
- Flexible Output: JSON, plain text, count, and pivot output formats
- Model History: Track your usage with
--historyand--history-clear - Configuration: Persistent configuration for default model and settings
- Text Analysis: Token count, word count, character count, and token frequency
- Cross-platform: Works on Windows, macOS, and Linux
- Local-first: Works locally on device (except Google)
Installation
pip install tokker
That's it! The tok command is now available in your terminal.
Command Reference
usage: tok [-h] [-m MODEL] [-o {json,plain,count,pivot}]
[-D MODEL_DEFAULT] [-M]
[-H] [-X]
[text]
positional arguments:
text text to tokenize (or read from stdin)
options:
-h, --help show this help message and exit
-m, --model MODEL model to use (overrides default)
-o, --output {json,plain,count,pivot} output format (default: json)
-D, --model-default MODEL_DEFAULT set default model
-M, --models list all available models
-H, --history show history of used models
-X, --history-clear clear history
Usage
Tokenize Text
Tip: When using bash or zsh, wrap input text in single quotes ('like this'). Double quotes cause issues with special characters such as ! (used for history expansion).
# Tokenize with default model
tok 'Hello world'
# Get a specific output format
tok 'Hello world' -o plain
# Use a specific model
tok 'Hello world' -m deepseek-ai/DeepSeek-R1
# Get just the counts
tok 'Hello world' -m gemini-2.5-pro -o count
Pipeline Usage
# Process files
cat document.txt | tok -m gpt2 -o count
# Chain with other tools
curl -s https://example.com | tok -m bert-base-uncased
# Compare models
echo "Machine learning is awesome" | tok -m gpt2
echo "Machine learning is awesome" | tok -m bert-base-uncased
List Available Models
# See all available models
tok -M
Output:
============
OpenAI:
cl100k_base used in GPT-3.5 (late), GPT-4
o200k_base used in GPT-4o, o-family (o1, o3, o4)
p50k_base used in GPT-3.5 (early)
p50k_edit used in GPT-3 edit models (text-davinci, code-davinci)
r50k_base used in GPT-3 base models (davinci, curie, babbage, ada)
------------
Google:
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash
gemini-2.0-flash-lite
Auth setup required -> https://github.com/igoakulov/tokker/blob/main/tokker/google-auth-guide.md
------------
HuggingFace (BYOM - Bring You Own Model):
1. Go to -> https://huggingface.co/models?library=transformers
2. Search any model with TRANSFORMERS library support
3. Copy its `USER/MODEL` into your command like:
deepseek-ai/DeepSeek-R1
google-bert/bert-base-uncased
google/gemma-3n-E4B-it
meta-llama/Meta-Llama-3.1-405B
mistralai/Devstral-Small-2507
moonshotai/Kimi-K2-Instruct
Qwen/Qwen3-Coder-480B-A35B-Instruct
============
Set Default Model
# Set your preferred model
tok -D o200k_base
History
# View your model usage history with date/time
tok -H
# Clear your history (will prompt for confirmation)
tok -X
History is stored locally in ~/.config/tokker/history.json.
Output Formats
Full JSON Output (Default)
$ tok 'Hello world'
{
"converted": "Hello⎮ world",
"token_strings": ["Hello", " world"],
"token_ids": [24912, 2375],
"token_count": 2,
"word_count": 2,
"char_count": 11
}
Plain Text Output
$ tok 'Hello world' -o plain
Hello⎮ world
Count Output
$ tok 'Hello world' -o count
{
"token_count": 2,
"word_count": 2,
"char_count": 11
}
Pivot Output
The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).
Example:
$ tok 'never gonna give you up neve gonna let you down' -m cl100k_base -o pivot
{
" gonna": 2,
" you": 2,
" down": 1,
" give": 1,
" let": 1,
" ne": 1,
" up": 1,
"never": 1,
"ve": 1
}
Configuration
Your configuration is stored locally in ~/.config/tokker/config.json:
{
"default_model": "o200k_base",
"delimiter": "⎮"
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Issues and pull requests are welcome! Visit the GitHub repository.
Acknowledgments
- OpenAI for the tiktoken library
- HuggingFace for the transformers library
- Google for the Gemini models and APIs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokker-0.3.4.tar.gz.
File metadata
- Download URL: tokker-0.3.4.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
239738d72f3382246f963a7a4074c05d6c50f87370d958decc7896bd1833c9e6
|
|
| MD5 |
7fa861fd585c6813702237421f214975
|
|
| BLAKE2b-256 |
b20c8c46374b1abe6e548a7d44762397cbbba8803c0f41220efbb1abee00a1e7
|
File details
Details for the file tokker-0.3.4-py3-none-any.whl.
File metadata
- Download URL: tokker-0.3.4-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c1ed1c2ec0acf685c4c432b8520ae356fe634359e1923582431287ae3108360
|
|
| MD5 |
f1b3f3e5e86e907acac4d0ada65c3863
|
|
| BLAKE2b-256 |
a305784ecc3a554b64f7c93d3b73b49003ba6927e83628cf1a0f04130fb84b1b
|