Skip to main content

Tokker: a fast local-first CLI tokenizer with all the best models in one place

Project description

Tokker

Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place.

Features

  • Simple Usage: Just tok 'your text' - that's it!
  • Models:
    • OpenAI: GPT-OSS, o-family (o1, o3, o4), GPT-4o, GPT-4, GPT-3.5, GPT-3
    • Google: the entire Gemini family
    • HuggingFace: popular models like Deepseek-R1, Qwen-3, GLM-4.5 and many other within transformers library (some may not be supported yet)
  • Output Formats: color (like this), count, JSON, pivot, and more
  • Text Analysis: Token count, word count, character count, and token frequency
  • Model History: See your recently used models
  • Local-first: Works locally on device (except Google)

Installation

# Install tokker without model provider packages (optional)
pip install tokker

# Install at least one model provider package:
pip install 'tokker[all]' # for all models at once
pip install 'tokker[tiktoken]' # for models from OpenAI
pip install 'tokker[google-genai]' # for models from Google
pip install 'tokker[transformers]' # for models from HuggingFace

Command Reference

usage: tok [--help] [-w MODEL] [-o {color,count,json,pivot,del}] [-m] [-c]
           [-dm MODEL] [-do OUTPUT] [-h] [-x]
           [text]

Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place

positional arguments:
  text                  text to tokenize (or read from stdin)

options:
  --help                (or just `tok`) to show this help message
  -w, --with MODEL      with specific (non-default) model
  -o, --output {color,count,json,pivot,del}
                        output format
  -m, --models          list all models
  -c, --config          show config with settings
  -dm, --default-model MODEL
                        set default model
  -do, --default-output OUTPUT
                        set default output
  -h, --history         show history of used models
  -x, --history-clear   clear history

Usage

Tokenization

When using bash or zsh, wrap input text in single quotes ('like this') to avoid conflicts with special characters like !.

# Tokenize with default model (o200k_base) and output (color)
$ tok 'Hello world!'
# Get pivot summary of token frequencies
$ tok 'Hello world!' -o pivot
# Tokenize with Deepseek-R1
$ tok 'Hello world!' -w zai-org/GLM-4.5
# Get just the count with Gemini-2.5-pro
$ tok 'Hello world!' -w gemini-2.5-pro -o count

Pipeline Usage

# Process files
$ cat document.txt | tok -w deepseek-ai/DeepSeek-R1 -o count

# Chain with other tools
$ curl -s https://example.com | tok -w openai/gpt-oss-120b

# Compare models
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.5-flash
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.0-flash

Models

# List all available models
$ tok -m

Output:

============
OpenAI:

o200k_base            - for GPT-OSS, o-family (o1, o3, o4) and GPT-4o
cl100k_base           - for GPT-3.5 (late), GPT-4
p50k_base             - for GPT-3.5 (early)
p50k_edit             - for GPT-3 edit models (text-davinci, code-davinci)
r50k_base             - for GPT-3 base models (davinci, curie, babbage, ada)
------------
Google:

gemini-2.5-pro
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.0-flash-lite
gemini-2.0-flash

Auth setup required   ->   https://github.com/igoakulov/tokker/blob/main/google-auth-guide.md
------------
HuggingFace:

  1. Go to   ->   https://huggingface.co/models?library=transformers
  2. Search models within TRANSFORMERS library (some not supported yet)
  3. Copy its `USER/MODEL` into your command, for example:

openai/gpt-oss-120b
Qwen/Qwen3-Coder-480B-A35B-Instruct
zai-org/GLM-4.5
deepseek-ai/DeepSeek-R1
facebook/bart-base
google-bert/bert-base-uncased
google/electra-base-discriminator
microsoft/phi-4
============

Config

Stored locally in ~/.config/tokker/config.json.

Show config:

# Show config with settings
$ tok -c

Returns:

{
  "default_model": "o200k_base",
  "default_output": "color",
  "delimiter": "⎮"
}

Set defaults:

# Set a Deepseek-R1 as the default model
$ tok -dm deepseek-ai/DeepSeek-R1
# Set count as the default output
$ tok -do count

History

Stored locally in ~/.config/tokker/history.json.

Show history:

$ tok -h

Returns:

============
History:

gemini-2.5-pro                  2025-08-09 19:58
cl100k_base                     2025-08-09 19:52
gpt2                            2025-08-08 16:23
============

Clear history:

# Does not ask for confirmation
$ tok -x

Output Formats

Color Output (Default)

  • Marks each token with an alternating color.
  • Color formatting does not render in the example below, but it's like this.
  • Color formatting is not preserved when copying the CLI output.

Command:

$ tok 'Hello world!'

Returns:

Hello world!
3 tokens, 2 words, 12 chars

Del (=Delimited) Output

  • Preserves visual token separation when you copy (unlike color)
  • After pasting, remove "⎮" easily with Find & Replace if needed
  • "⎮" (U+23AE VERTICAL LINE EXTENSION) is a rare symbol, and will not interfere with the standard "|" (U+007C VERTICAL LINE)

Command:

$ tok 'Hello world!' -o del

Returns:

Hello⎮ world⎮!
3 tokens, 2 words, 12 chars

Count Output

Command:

$ tok 'Hello world!' -o count

Returns:

{
  "token_count": 3,
  "word_count": 2,
  "char_count": 12
}

Pivot Output

The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).

Command:

tok 'never gonna give you up neve gonna let you down' -o pivot

Returns:

{
  " gonna": 2,
  " you": 2,
  " down": 1,
  " give": 1,
  " let": 1,
  " ne": 1,
  " up": 1,
  "never": 1,
  "ve": 1
}

Full JSON Output

Command:

tok 'Hello world!' -o json

Returns:

{
  "delimited_text": "Hello⎮ world⎮!",
  "token_strings": ["Hello", " world", "!"],
  "token_ids": [9906, 1917, 0],
  "token_count": 3,
  "word_count": 2,
  "char_count": 12
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contributing

Issues and pull requests are welcome! Visit the GitHub repository.


Acknowledgments

  • OpenAI for the tiktoken library
  • HuggingFace for the transformers library
  • Google for the Gemini models and APIs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.3.9.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokker-0.3.9-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file tokker-0.3.9.tar.gz.

File metadata

  • Download URL: tokker-0.3.9.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.9.tar.gz
Algorithm Hash digest
SHA256 eb2676268c543bacac5c84ef251beb3ea3eb9b59e3b71678968713d9047e7f83
MD5 cd625d5405aeaa7b02a06d24ada335b5
BLAKE2b-256 d8106463416a692ca64b5babc4529adda91a981c159ee39897883d678cd0852c

See more details on using hashes here.

File details

Details for the file tokker-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: tokker-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 bf2299fe144087f99f59998d6d9ed5a9f6d59b8967dfe722be3dea18f536a831
MD5 543dded05e1c6fa7dd6f0a9a74c5f691
BLAKE2b-256 3d3069c6f1aa136bdeb4a6df25d41137ca06e8196b6dbd6cc8c0cf8bcacdda74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page