Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place

These details have not been verified by PyPI

Project links

Project description

Tokker

Tokker is a fast local-first CLI tool for tokenizing text with all the best models in one place.

Features

Simple Usage: Just tok 'your text' - that's it!
Models:
- OpenAI: GPT-3, GPT-3.5, GPT-4, GPT-4o, o-family (o1, o3, o4)
- HuggingFace: select literally any model that supports transformers library
Flexible Output: JSON, plain text, and count output formats
Model History: Track your usage with --history and --history-clear
Configuration: Persistent configuration for default model and settings
Text Analysis: Token count, word count, character count, and token frequency
Cross-platform: Works on Windows, macOS, and Linux
99% local: Works fully locally on device (besides initial model load)

Installation

pip install tokker

That's it! The tok command is now available in your terminal.

Command Reference

usage: tok [-h] [--model MODEL] [--output {json,plain,count,table}]
           [--model-default MODEL_DEFAULT] [--models]
           [--history] [--history-clear]
           [text]

positional arguments:
  text                  Text to tokenize (or read from stdin if not provided)

options:
  -h, --help           Show this help message and exit
  --model MODEL        Model to use (overrides default). Use --models to see available options
  --output {json,plain,count,table}
                       Output format (default: json)
  --model-default MODEL_DEFAULT
                       Set the default model in configuration. Use --models to see available options
  --models             List all available models with descriptions
  --history            Show history of used models, with most recent on top
  --history-clear      Clear model usage history

Usage

Tip: When using bash or zsh, wrap input text in single quotes ('like this'). Double quotes cause issues with special characters such as ! (used for history expansion).

Tokenize Text

# Tokenize with default model
tok 'Hello world'

# Get a specific output format
tok 'Hello world' --output plain

# Use a specific model
tok 'Hello world' --model gpt2

# Get just the counts
tok 'Hello world' --output count

Pipeline Usage

# Process files
cat document.txt | tok --model gpt2 --output count

# Chain with other tools
curl -s https://example.com | tok --model bert-base-uncased

# Compare models
echo "Machine learning is awesome" | tok --model gpt2
echo "Machine learning is awesome" | tok --model bert-base-uncased

List Available Models

# See all available models
tok --models

Output:

--- OpenAI ---
cl100k_base     ->   used in GPT-3.5 (late), GPT-4
o200k_base      ->   used in GPT-4o, o-family (o1, o3, o4)
p50k_base       ->   used in GPT-3.5 (early)
p50k_edit       ->   used in GPT-3 edit models for text and code (text-davinci, code-davinci)
r50k_base       ->   used in GPT-3 base models (davinci, curie, babbage, ada)

--- HuggingFace ---
BYOM - Bring You Own Model:
1. Go to   ->   https://huggingface.co/models?library=transformers
2. Search any model with TRANSFORMERS library support
3. Copy its `USER/MODEL-NAME` into your command:

deepseek-ai/DeepSeek-R1
google-bert/bert-base-uncased
google/gemma-3n-E4B-it
meta-llama/Meta-Llama-3.1-405B
mistralai/Devstral-Small-2507
moonshotai/Kimi-K2-Instruct
Qwen/Qwen3-Coder-480B-A35B-Instruct
Etc.

--- Related Commands ---
`--model-default 'model-name'` to set default model
`--history` to view all models you have used

Set Default Model

# Set your preferred model
tok --model-default o200k_base

History

# View your model usage history with date/time
tok --history

# Clear your history
tok --history-clear

History is stored locally in ~/.config/tokker/history.json.

Output Formats

Full JSON Output (Default)

$ tok 'Hello world'
{
  "converted": "Hello⎮ world",
  "token_strings": ["Hello", " world"],
  "token_ids": [24912, 2375],
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "pivot": {
    "Hello": 1,
    " world": 1
  },
  "model": "o200k_base",
  "provider": "OpenAI"
}

Plain Text Output

$ tok 'Hello world' --output plain
Hello⎮ world

Count Output

$ tok 'Hello world' --output count
{
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "model": "o200k_base"
}

Configuration

Your configuration is stored locally in ~/.config/tokker/config.json:

{
  "default_model": "o200k_base",
  "delimiter": "⎮"
}

Programmatic Usage

You can also use tokker in your Python code:

import tokker

# Count tokens
count = tokker.count_tokens("Hello world", "o200k_base")
print(f"Token count: {count}")

# Full tokenization
result = tokker.tokenize_text("Hello world", "gpt2")
print(result["token_count"])

# Word and character counts
words = tokker.count_words("Hello world")
chars = tokker.count_characters("Hello world")
print(f"Words: {words}, Characters: {chars}")

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Issues and pull requests are welcome! Visit the GitHub repository.

Acknowledgments

OpenAI for the tiktoken library
HuggingFace for the transformers library

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.9

Aug 9, 2025

0.3.8

Aug 7, 2025

0.3.7

Aug 7, 2025

0.3.6

Aug 7, 2025

0.3.5

Aug 6, 2025

0.3.4

Aug 1, 2025

This version

0.2.1

Jul 31, 2025

0.2.0

Jul 29, 2025

0.1.2

Jul 28, 2025

0.1.1

Jul 28, 2025

0.1.0

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.2.1.tar.gz (22.2 kB view details)

Uploaded Jul 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokker-0.2.1-py3-none-any.whl (21.9 kB view details)

Uploaded Jul 31, 2025 Python 3

File details

Details for the file tokker-0.2.1.tar.gz.

File metadata

Download URL: tokker-0.2.1.tar.gz
Upload date: Jul 31, 2025
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`de55af5911b71da87040680fd08a1dd360b828187ab44d865a03ae1f0ea3c94f`
MD5	`1ba3c8b39b61b17d01083bf2827684e9`
BLAKE2b-256	`0b223de46fb059dbadb0186f34a3c3af09c2377ce62d4e754d55f74f7f5591cf`

See more details on using hashes here.

File details

Details for the file tokker-0.2.1-py3-none-any.whl.

File metadata

Download URL: tokker-0.2.1-py3-none-any.whl
Upload date: Jul 31, 2025
Size: 21.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38f4ffa4a17765d6a99b5b8d3b23d1f69122e6eb31e9a28302539a8c7f63eef2`
MD5	`ba250b2e2e67f5585b53b95f6267ce5a`
BLAKE2b-256	`7683a34ffa26534ad1cc179e314581839d1d8b534e9cfed30769709e54353722`

See more details on using hashes here.

tokker 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokker

Features

Installation

Command Reference

Usage

Tokenize Text

Pipeline Usage

List Available Models

Set Default Model

History

Output Formats

Full JSON Output (Default)

Plain Text Output

Count Output

Configuration

Programmatic Usage

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes