Skip to main content

Tokker: a fast local-first CLI tokenizer with all the best models in one place

Project description

Tokker

Tokker: a fast local-first CLI tokenizer with all the best models in one place.

Features

  • Simple Usage: Just tok 'your text' - that's it!
  • Models:
    • OpenAI: GPT-3, GPT-3.5, GPT-4, GPT-4o, o-family (o1, o3, o4)
    • Google: the entire Gemini family
    • HuggingFace: select literally any model that supports transformers library
  • Flexible Output: JSON, plain text, count, and pivot output formats
  • Text Analysis: Token count, word count, character count, and token frequency
  • Model History: Track your usage with --history and --history-clear
  • Local-first: Works locally on device (except Google)

Installation

# Install tokker without model provider packages (optional)
pip install tokker

# Install at least one model provider package:
pip install 'tokker[all]' # for all models at once
pip install 'tokker[tiktoken]' # for models from OpenAI
pip install 'tokker[google-genai]' # for models from Google
pip install 'tokker[transformers]' # for models from HuggingFace

Command Reference

usage: tok [-h] [-m MODEL] [-o {json,plain,count,pivot}] [-D MODEL_DEFAULT] [-M]
           [-H] [-X]
           [text]

Tokker: a fast local-first CLI tokenizer with all the best models in one place

positional arguments:
  text                  text to tokenize (or read from stdin if not provided)

options:
  -h, --help            show this help message and exit
  -m, --model MODEL     model to use (overrides default)
  -o, --output {json,plain,count,pivot}
                        output format (default: json)
  -D, --model-default MODEL_DEFAULT
                        set default model
  -M, --models          list all models
  -H, --history         show history of used models
  -X, --history-clear   clear history

============
Examples:
  echo 'Hello world' | tok
  tok 'Hello world'
  tok 'Hello world' -m deepseek-ai/DeepSeek-R1
  tok 'Hello world' -m gemini-2.5-pro -o count
  tok 'Hello world' -o pivot
  tok -D cl100k_base
============
Install model providers:
  pip install 'tokker[all]'                 - all at once
  pip install 'tokker[tiktoken]'            - OpenAI
  pip install 'tokker[transformers]'        - HuggingFace
  pip install 'tokker[google-genai]'        - Google

Google auth setup   →   https://github.com/igoakulov/tokker/blob/main/google-auth-guide.md

Usage

Tokenize Text

When using bash or zsh, wrap input text in single quotes ('like this') to avoid conflicts with special characters like !.

# Tokenize with default model
tok 'Hello world'

# Get a specific output format
tok 'Hello world' -o plain

# Use a specific model
tok 'Hello world' -m openai/gpt-oss-120b

# Get just the counts
tok 'Hello world' -m gemini-2.5-pro -o count

Pipeline Usage

# Process files
cat document.txt | tok -m deepseek-ai/DeepSeek-R1 -o count

# Chain with other tools
curl -s https://example.com | tok -m bert-base-uncased

# Compare models
echo "Machine learning is awesome" | tok -m openai/gpt-oss-120b
echo "Machine learning is awesome" | tok -m bert-base-uncased

List Available Models

# See all available models
tok -M

Output:

============
OpenAI (installed)

  o200k_base            - for GPT-4o, o-family (o1, o3, o4)
  cl100k_base           - for GPT-3.5 (late), GPT-4
  p50k_base             - for GPT-3.5 (early)
  p50k_edit             - for GPT-3 edit models (text-davinci, code-davinci)
  r50k_base             - for GPT-3 base models (davinci, curie, babbage, ada)
------------
Google (installed)

  gemini-2.5-pro
  gemini-2.5-flash-lite
  gemini-2.5-flash
  gemini-2.0-flash-lite
  gemini-2.0-flash

Auth setup required   ->   https://github.com/igoakulov/tokker/blob/main/google-auth-guide.md
------------
HuggingFace (installed)

  1. Go to   ->   https://huggingface.co/models?library=transformers
  2. Search any model with TRANSFORMERS library support
  3. Copy its `USER/MODEL` into your command like:

  openai/gpt-oss-120b
  Qwen/Qwen3-Coder-480B-A35B-Instruct
  moonshotai/Kimi-K2-Instruct
  deepseek-ai/DeepSeek-R1
  google/gemma-3n-E4B-it
  google-bert/bert-base-uncased
  meta-llama/Meta-Llama-3.1-405B
  mistralai/Devstral-Small-2507
============

Set Default Model

# Set your preferred model
tok -D o200k_base

History

# View your model usage history with date/time
tok -H

# Clear your history (will prompt for confirmation)
tok -X

History is stored locally in ~/.config/tokker/history.json.


Output Formats

Full JSON Output (Default)

tok 'Hello world'
{
  "delimited_text": "Hello⎮ world",
  "token_strings": ["Hello", " world"],
  "token_ids": [24912, 2375],
  "token_count": 2,
  "word_count": 2,
  "char_count": 11
}

Plain Text Output

tok 'Hello world' -o plain
Hello⎮ world

Count Output

tok 'Hello world' -o count
{
  "token_count": 2,
  "word_count": 2,
  "char_count": 11
}

Pivot Output

The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).

Example:

tok 'never gonna give you up neve gonna let you down' -m cl100k_base -o pivot
{
  " gonna": 2,
  " you": 2,
  " down": 1,
  " give": 1,
  " let": 1,
  " ne": 1,
  " up": 1,
  "never": 1,
  "ve": 1
}

Configuration

Your configuration is stored locally in ~/.config/tokker/config.json:

{
  "default_model": "o200k_base",
  "delimiter": "⎮"
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contributing

Issues and pull requests are welcome! Visit the GitHub repository.


Acknowledgments

  • OpenAI for the tiktoken library
  • HuggingFace for the transformers library
  • Google for the Gemini models and APIs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.3.8.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokker-0.3.8-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file tokker-0.3.8.tar.gz.

File metadata

  • Download URL: tokker-0.3.8.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.8.tar.gz
Algorithm Hash digest
SHA256 d7a133b3a3ae461d80e4d8085ac9157a0406d13455d1b9319d532102a9ca0964
MD5 2afa3d3017d90b585bb9450fbbd1ea61
BLAKE2b-256 9e956621675af981eb7e27b0ba33fc54eba6eedd3e96514ce25b3042e801ca78

See more details on using hashes here.

File details

Details for the file tokker-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: tokker-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 54ef65d4948872845954e530d879f7cd73eaf574ad3c4aff472da398d981779a
MD5 9fd3c0d3f0b84b87488364c2f0702697
BLAKE2b-256 392aec7ee6c61314e844fa8eceb679fd9a5d2ad3b7a9ed64d89132b6f6cf6a4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page