Tokker: a fast local-first CLI tokenizer with all the best models in one place

These details have not been verified by PyPI

Project links

Project description

Tokker

Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place.

Features

Simple Usage: Just tok 'your text' - that's it!
Models:
- OpenAI: GPT-OSS, o-family (o1, o3, o4), GPT-4o, GPT-4, GPT-3.5, GPT-3
- Google: the entire Gemini family
- HuggingFace: popular models like Deepseek-R1, Qwen-3, GLM-4.5 and many other within transformers library (some may not be supported yet)
Output Formats: color (like this), count, JSON, pivot, and more
Text Analysis: Token count, word count, character count, and token frequency
Model History: See your recently used models
Local-first: Works locally on device (except Google)

Installation

# Install tokker without model provider packages (optional)
pip install tokker

# Install at least one model provider package:
pip install 'tokker[all]' # for all models at once
pip install 'tokker[tiktoken]' # for models from OpenAI
pip install 'tokker[google-genai]' # for models from Google
pip install 'tokker[transformers]' # for models from HuggingFace

Command Reference

usage: tok [--help] [-w MODEL] [-o {color,count,json,pivot,del}] [-m] [-c]
           [-dm MODEL] [-do OUTPUT] [-h] [-x]
           [text]

Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place

positional arguments:
  text                  text to tokenize (or read from stdin)

options:
  --help                (or just `tok`) to show this help message
  -w, --with MODEL      with specific (non-default) model
  -o, --output {color,count,json,pivot,del}
                        output format
  -m, --models          list all models
  -c, --config          show config with settings
  -dm, --default-model MODEL
                        set default model
  -do, --default-output OUTPUT
                        set default output
  -h, --history         show history of used models
  -x, --history-clear   clear history

Usage

Tokenization

When using bash or zsh, wrap input text in single quotes ('like this') to avoid conflicts with special characters like !.

# Tokenize with default model (o200k_base) and output (color)
$ tok 'Hello world!'
# Get pivot summary of token frequencies
$ tok 'Hello world!' -o pivot
# Tokenize with Deepseek-R1
$ tok 'Hello world!' -w zai-org/GLM-4.5
# Get just the count with Gemini-2.5-pro
$ tok 'Hello world!' -w gemini-2.5-pro -o count

Pipeline Usage

# Process files
$ cat document.txt | tok -w deepseek-ai/DeepSeek-R1 -o count

# Chain with other tools
$ curl -s https://example.com | tok -w openai/gpt-oss-120b

# Compare models
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.5-flash
$ echo "I'm tired boss, I can't do matmul anymore" | tok -w gemini-2.0-flash

Models

# List all available models
$ tok -m

Output:

============
OpenAI:

o200k_base            - for GPT-OSS, o-family (o1, o3, o4) and GPT-4o
cl100k_base           - for GPT-3.5 (late), GPT-4
p50k_base             - for GPT-3.5 (early)
p50k_edit             - for GPT-3 edit models (text-davinci, code-davinci)
r50k_base             - for GPT-3 base models (davinci, curie, babbage, ada)
------------
Google:

gemini-2.5-pro
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.0-flash-lite
gemini-2.0-flash

Auth setup required   ->   https://github.com/igoakulov/tokker/blob/main/google-auth-guide.md
------------
HuggingFace:

  1. Go to   ->   https://huggingface.co/models?library=transformers
  2. Search models within TRANSFORMERS library (some not supported yet)
  3. Copy its `USER/MODEL` into your command, for example:

openai/gpt-oss-120b
Qwen/Qwen3-Coder-480B-A35B-Instruct
zai-org/GLM-4.5
deepseek-ai/DeepSeek-R1
facebook/bart-base
google-bert/bert-base-uncased
google/electra-base-discriminator
microsoft/phi-4
============

Config

Stored locally in ~/.config/tokker/config.json.

Show config:

# Show config with settings
$ tok -c

Returns:

{
  "default_model": "o200k_base",
  "default_output": "color",
  "delimiter": "⎮"
}

Set defaults:

# Set a Deepseek-R1 as the default model
$ tok -dm deepseek-ai/DeepSeek-R1
# Set count as the default output
$ tok -do count

History

Stored locally in ~/.config/tokker/history.json.

Show history:

$ tok -h

Returns:

============
History:

gemini-2.5-pro                  2025-08-09 19:58
cl100k_base                     2025-08-09 19:52
gpt2                            2025-08-08 16:23
============

Clear history:

# Does not ask for confirmation
$ tok -x

Output Formats

Color Output (Default)

Marks each token with an alternating color.
Color formatting does not render in the example below, but it's like this.
Color formatting is not preserved when copying the CLI output.

Command:

$ tok 'Hello world!'

Returns:

Hello world!
3 tokens, 2 words, 12 chars

Del (=Delimited) Output

Preserves visual token separation when you copy (unlike color)
After pasting, remove "⎮" easily with Find & Replace if needed
"⎮" (U+23AE VERTICAL LINE EXTENSION) is a rare symbol, and will not interfere with the standard "|" (U+007C VERTICAL LINE)

Command:

$ tok 'Hello world!' -o del

Returns:

Hello⎮ world⎮!
3 tokens, 2 words, 12 chars

Count Output

Command:

$ tok 'Hello world!' -o count

Returns:

{
  "token_count": 3,
  "word_count": 2,
  "char_count": 12
}

Pivot Output

The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).

Command:

tok 'never gonna give you up neve gonna let you down' -o pivot

Returns:

{
  " gonna": 2,
  " you": 2,
  " down": 1,
  " give": 1,
  " let": 1,
  " ne": 1,
  " up": 1,
  "never": 1,
  "ve": 1
}

Full JSON Output

Command:

tok 'Hello world!' -o json

Returns:

{
  "delimited_text": "Hello⎮ world⎮!",
  "token_strings": ["Hello", " world", "!"],
  "token_ids": [9906, 1917, 0],
  "token_count": 3,
  "word_count": 2,
  "char_count": 12
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Issues and pull requests are welcome! Visit the GitHub repository.

Acknowledgments

OpenAI for the tiktoken library
HuggingFace for the transformers library
Google for the Gemini models and APIs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.9

Aug 9, 2025

0.3.8

Aug 7, 2025

0.3.7

Aug 7, 2025

0.3.6

Aug 7, 2025

0.3.5

Aug 6, 2025

0.3.4

Aug 1, 2025

0.2.1

Jul 31, 2025

0.2.0

Jul 29, 2025

0.1.2

Jul 28, 2025

0.1.1

Jul 28, 2025

0.1.0

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.3.9.tar.gz (30.6 kB view details)

Uploaded Aug 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokker-0.3.9-py3-none-any.whl (36.3 kB view details)

Uploaded Aug 9, 2025 Python 3

File details

Details for the file tokker-0.3.9.tar.gz.

File metadata

Download URL: tokker-0.3.9.tar.gz
Upload date: Aug 9, 2025
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.9.tar.gz
Algorithm	Hash digest
SHA256	`eb2676268c543bacac5c84ef251beb3ea3eb9b59e3b71678968713d9047e7f83`
MD5	`cd625d5405aeaa7b02a06d24ada335b5`
BLAKE2b-256	`d8106463416a692ca64b5babc4529adda91a981c159ee39897883d678cd0852c`

See more details on using hashes here.

File details

Details for the file tokker-0.3.9-py3-none-any.whl.

File metadata

Download URL: tokker-0.3.9-py3-none-any.whl
Upload date: Aug 9, 2025
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf2299fe144087f99f59998d6d9ed5a9f6d59b8967dfe722be3dea18f536a831`
MD5	`543dded05e1c6fa7dd6f0a9a74c5f691`
BLAKE2b-256	`3d3069c6f1aa136bdeb4a6df25d41137ca06e8196b6dbd6cc8c0cf8bcacdda74`

See more details on using hashes here.

tokker 0.3.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokker

Tokker 0.3.9: a fast local-first CLI tokenizer with all the best models in one place.

Features

Installation

Command Reference

Usage

Tokenization

Pipeline Usage

Models

Config

History

Output Formats

Color Output (Default)

Del (=Delimited) Output

Count Output

Pivot Output

Full JSON Output

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes