Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place
Project description
Tokker
Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place.
Features
- Simple Usage: Just
tok 'your text'- that's it! - Models:
- OpenAI: GPT-3, GPT-3.5, GPT-4, GPT-4o, o-family (o1, o3, o4)
- HuggingFace: select literally any model that supports
transformerslibrary
- Flexible Output: JSON, plain text, and count output formats
- Model History: Track your usage with
--historyand--history-clear - Configuration: Persistent configuration for default model and settings
- Text Analysis: Token count, word count, character count, and token frequency
- Cross-platform: Works on Windows, macOS, and Linux
- 99% local: Works fully locally on device (besides initial model load)
Installation
pip install tokker
That's it! The tok command is now available in your terminal.
Command Reference
usage: tok [-h] [--model MODEL] [--output {json,plain,count,table}]
[--model-default MODEL_DEFAULT] [--models]
[--history] [--history-clear]
[text]
positional arguments:
text Text to tokenize (or read from stdin if not provided)
options:
-h, --help Show this help message and exit
--model MODEL Model to use (overrides default). Use --models to see available options
--output {json,plain,count,table}
Output format (default: json)
--model-default MODEL_DEFAULT
Set the default model in configuration. Use --models to see available options
--models List all available models with descriptions
--history Show history of used models, with most recent on top
--history-clear Clear model usage history
Usage
Tip: When using bash or zsh, wrap input text in single quotes ('like this'). Double quotes cause issues with special characters such as ! (used for history expansion).
Tokenize Text
# Tokenize with default model
tok 'Hello world'
# Get a specific output format
tok 'Hello world' --output plain
# Use a specific model
tok 'Hello world' --model gpt2
# Get just the counts
tok 'Hello world' --output count
Pipeline Usage
# Process files
cat document.txt | tok --model gpt2 --output count
# Chain with other tools
curl -s https://example.com | tok --model bert-base-uncased
# Compare models
echo "Machine learning is awesome" | tok --model gpt2
echo "Machine learning is awesome" | tok --model bert-base-uncased
List Available Models
# See all available models
tok --models
Output:
--- OpenAI ---
cl100k_base -> used in GPT-3.5 (late), GPT-4
o200k_base -> used in GPT-4o, o-family (o1, o3, o4)
p50k_base -> used in GPT-3.5 (early)
p50k_edit -> used in GPT-3 edit models for text and code (text-davinci, code-davinci)
r50k_base -> used in GPT-3 base models (davinci, curie, babbage, ada)
--- HuggingFace ---
BYOM - Bring You Own Model:
1. Go to -> https://huggingface.co/models?library=transformers
2. Search any model with TRANSFORMERS library support
3. Copy its `USER/MODEL-NAME` into your command:
deepseek-ai/DeepSeek-R1
google-bert/bert-base-uncased
google/gemma-3n-E4B-it
meta-llama/Meta-Llama-3.1-405B
mistralai/Devstral-Small-2507
moonshotai/Kimi-K2-Instruct
Qwen/Qwen3-Coder-480B-A35B-Instruct
Etc.
--- Related Commands ---
`--model-default 'model-name'` to set default model
`--history` to view all models you have used
Set Default Model
# Set your preferred model
tok --model-default o200k_base
History
# View your model usage history with date/time
tok --history
# Clear your history
tok --history-clear
History is stored locally in ~/.config/tokker/history.json.
Output Formats
Full JSON Output (Default)
$ tok 'Hello world'
{
"converted": "Hello⎮ world",
"token_strings": ["Hello", " world"],
"token_ids": [24912, 2375],
"token_count": 2,
"word_count": 2,
"char_count": 11,
"pivot": {
"Hello": 1,
" world": 1
},
"model": "o200k_base",
"provider": "OpenAI"
}
Plain Text Output
$ tok 'Hello world' --output plain
Hello⎮ world
Count Output
$ tok 'Hello world' --output count
{
"token_count": 2,
"word_count": 2,
"char_count": 11,
"model": "o200k_base"
}
Configuration
Your configuration is stored locally in ~/.config/tokker/config.json:
{
"default_model": "o200k_base",
"delimiter": "⎮"
}
Programmatic Usage
You can also use tokker in your Python code:
import tokker
# Count tokens
count = tokker.count_tokens("Hello world", "o200k_base")
print(f"Token count: {count}")
# Full tokenization
result = tokker.tokenize_text("Hello world", "gpt2")
print(result["token_count"])
# Word and character counts
words = tokker.count_words("Hello world")
chars = tokker.count_characters("Hello world")
print(f"Words: {words}, Characters: {chars}")
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Issues and pull requests are welcome! Visit the GitHub repository.
Acknowledgments
- OpenAI for the tiktoken library
- HuggingFace for the transformers library
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokker-0.2.1.tar.gz.
File metadata
- Download URL: tokker-0.2.1.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de55af5911b71da87040680fd08a1dd360b828187ab44d865a03ae1f0ea3c94f
|
|
| MD5 |
1ba3c8b39b61b17d01083bf2827684e9
|
|
| BLAKE2b-256 |
0b223de46fb059dbadb0186f34a3c3af09c2377ce62d4e754d55f74f7f5591cf
|
File details
Details for the file tokker-0.2.1-py3-none-any.whl.
File metadata
- Download URL: tokker-0.2.1-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38f4ffa4a17765d6a99b5b8d3b23d1f69122e6eb31e9a28302539a8c7f63eef2
|
|
| MD5 |
ba250b2e2e67f5585b53b95f6267ce5a
|
|
| BLAKE2b-256 |
7683a34ffa26534ad1cc179e314581839d1d8b534e9cfed30769709e54353722
|