Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place

These details have not been verified by PyPI

Project links

Project description

Tokker

Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place.

Features

Simple Usage: Just tok 'your text' - that's it!
Models:
- OpenAI: GPT-3, GPT-3.5, GPT-4, GPT-4o, o-family (o1, o3, o4)
- Google: the entire Gemini family
- HuggingFace: select literally any model that supports transformers library
Flexible Output: JSON, plain text, count, and pivot output formats
Model History: Track your usage with --history and --history-clear
Configuration: Persistent configuration for default model and settings
Text Analysis: Token count, word count, character count, and token frequency
Cross-platform: Works on Windows, macOS, and Linux
Local-first: Works locally on device (except Google)

Installation

pip install tokker

That's it! The tok command is now available in your terminal.

Command Reference

usage: tok [-h] [-m MODEL] [-o {json,plain,count,pivot}]
           [-D MODEL_DEFAULT] [-M]
           [-H] [-X]
           [text]

positional arguments:
  text                                    text to tokenize (or read from stdin)

options:
  -h, --help                              show this help message and exit
  -m, --model MODEL                       model to use (overrides default)
  -o, --output {json,plain,count,pivot}   output format (default: json)
  -D, --model-default MODEL_DEFAULT       set default model
  -M, --models                            list all available models
  -H, --history                           show history of used models
  -X, --history-clear                     clear history

Usage

Tokenize Text

Tip: When using bash or zsh, wrap input text in single quotes ('like this'). Double quotes cause issues with special characters such as ! (used for history expansion).

# Tokenize with default model
tok 'Hello world'

# Get a specific output format
tok 'Hello world' -o plain

# Use a specific model
tok 'Hello world' -m deepseek-ai/DeepSeek-R1

# Get just the counts
tok 'Hello world' -m gemini-2.5-pro -o count

Pipeline Usage

# Process files
cat document.txt | tok -m gpt2 -o count

# Chain with other tools
curl -s https://example.com | tok -m bert-base-uncased

# Compare models
echo "Machine learning is awesome" | tok -m gpt2
echo "Machine learning is awesome" | tok -m bert-base-uncased

List Available Models

# See all available models
tok -M

Output:

============
OpenAI:

  cl100k_base           used in GPT-3.5 (late), GPT-4
  o200k_base            used in GPT-4o, o-family (o1, o3, o4)
  p50k_base             used in GPT-3.5 (early)
  p50k_edit             used in GPT-3 edit models (text-davinci, code-davinci)
  r50k_base             used in GPT-3 base models (davinci, curie, babbage, ada)
------------
Google:

  gemini-2.5-pro
  gemini-2.5-flash
  gemini-2.5-flash-lite
  gemini-2.0-flash
  gemini-2.0-flash-lite

Auth setup required   ->   https://github.com/igoakulov/tokker/blob/main/tokker/google-auth-guide.md
------------
HuggingFace (BYOM - Bring You Own Model):

  1. Go to   ->   https://huggingface.co/models?library=transformers
  2. Search any model with TRANSFORMERS library support
  3. Copy its `USER/MODEL` into your command like:

  deepseek-ai/DeepSeek-R1
  google-bert/bert-base-uncased
  google/gemma-3n-E4B-it
  meta-llama/Meta-Llama-3.1-405B
  mistralai/Devstral-Small-2507
  moonshotai/Kimi-K2-Instruct
  Qwen/Qwen3-Coder-480B-A35B-Instruct
============

Set Default Model

# Set your preferred model
tok -D o200k_base

History

# View your model usage history with date/time
tok -H

# Clear your history (will prompt for confirmation)
tok -X

History is stored locally in ~/.config/tokker/history.json.

Output Formats

Full JSON Output (Default)

$ tok 'Hello world'
{
  "converted": "Hello⎮ world",
  "token_strings": ["Hello", " world"],
  "token_ids": [24912, 2375],
  "token_count": 2,
  "word_count": 2,
  "char_count": 11
}

Plain Text Output

$ tok 'Hello world' -o plain
Hello⎮ world

Count Output

$ tok 'Hello world' -o count
{
  "token_count": 2,
  "word_count": 2,
  "char_count": 11
}

Pivot Output

The pivot output prints a JSON object with token frequencies, sorted by highest count first, then by token (A–Z).

Example:

$ tok 'never gonna give you up neve gonna let you down' -m cl100k_base -o pivot
{
  " gonna": 2,
  " you": 2,
  " down": 1,
  " give": 1,
  " let": 1,
  " ne": 1,
  " up": 1,
  "never": 1,
  "ve": 1
}

Configuration

Your configuration is stored locally in ~/.config/tokker/config.json:

{
  "default_model": "o200k_base",
  "delimiter": "⎮"
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Issues and pull requests are welcome! Visit the GitHub repository.

Acknowledgments

OpenAI for the tiktoken library
HuggingFace for the transformers library
Google for the Gemini models and APIs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.9

Aug 9, 2025

0.3.8

Aug 7, 2025

0.3.7

Aug 7, 2025

0.3.6

Aug 7, 2025

0.3.5

Aug 6, 2025

This version

0.3.4

Aug 1, 2025

0.2.1

Jul 31, 2025

0.2.0

Jul 29, 2025

0.1.2

Jul 28, 2025

0.1.1

Jul 28, 2025

0.1.0

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.3.4.tar.gz (27.7 kB view details)

Uploaded Aug 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokker-0.3.4-py3-none-any.whl (26.4 kB view details)

Uploaded Aug 1, 2025 Python 3

File details

Details for the file tokker-0.3.4.tar.gz.

File metadata

Download URL: tokker-0.3.4.tar.gz
Upload date: Aug 1, 2025
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`239738d72f3382246f963a7a4074c05d6c50f87370d958decc7896bd1833c9e6`
MD5	`7fa861fd585c6813702237421f214975`
BLAKE2b-256	`b20c8c46374b1abe6e548a7d44762397cbbba8803c0f41220efbb1abee00a1e7`

See more details on using hashes here.

File details

Details for the file tokker-0.3.4-py3-none-any.whl.

File metadata

Download URL: tokker-0.3.4-py3-none-any.whl
Upload date: Aug 1, 2025
Size: 26.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c1ed1c2ec0acf685c4c432b8520ae356fe634359e1923582431287ae3108360`
MD5	`f1b3f3e5e86e907acac4d0ada65c3863`
BLAKE2b-256	`a305784ecc3a554b64f7c93d3b73b49003ba6927e83628cf1a0f04130fb84b1b`

See more details on using hashes here.

tokker 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokker

Features

Installation

Command Reference

Usage

Tokenize Text

Pipeline Usage

List Available Models

Set Default Model

History

Output Formats

Full JSON Output (Default)

Plain Text Output

Count Output

Pivot Output

Configuration

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes