A fast, simple CLI tool for tokenizing text using OpenAI's tiktoken library

These details have not been verified by PyPI

Project links

Project description

Tokker

A fast, simple CLI tool for tokenizing text using OpenAI's tiktoken library. Get accurate token counts for GPT models with a single command.

Features

Simple Usage: Just tok 'your text' - that's it!
Multiple Tokenizers: Support for o200k_base (GPT-4o) and cl100k_base (GPT-4) tokenizers
Flexible Output: JSON, plain text, and summary output formats
Configuration: Persistent configuration for default tokenizer settings
Text Analysis: Token count, word count, character count, and token frequency analysis
Cross-platform: Works on Windows, macOS, and Linux

Installation

Install from PyPI with pip:

pip install tokker

That's it! The tok command is now available in your terminal.

Main commands

Quick Tips:

Use single quotes to avoid shell interpretation: tok 'Hello world!'
Pipe text from other commands: echo "Hello world" | tok
Process files: cat file.txt | tok --format summary
Chain with other tools: curl -s https://example.com | tok
Set your preferred tokenizer once: tok --set-default-tokenizer o200k_base

Full output

$ tok 'Hello world'
{
  "converted": "Hello⎮ world",
  "token_strings": ["Hello", " world"],
  "token_ids": [24912, 2375],
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "pivot": {
    "Hello": 1,
    " world": 1
  },
  "tokenizer": "o200k_base"
}

Plain Text Output

$ tok 'Hello world' --format plain
Hello⎮ world

Summary Statistics

$ tok 'Hello world' --format summary
{
  "token_count": 2,
  "word_count": 2,
  "char_count": 11,
  "tokenizer": "o200k_base"
}

Other Commands

Using Different Tokenizers

$ tok 'Hello world' --tokenizer cl100k_base

Set Default Tokenizer:

$ tok --set-default-tokenizer o200k_base
✓ Default tokenizer set to: o200k_base
Configuration saved to: ~/.config/tokker/tokenizer_config.json

Other

usage: tok [-h] [--tokenizer {o200k_base,cl100k_base}]
           [--format {json,plain,summary}]
           [--set-default-tokenizer {o200k_base,cl100k_base}]
           [text]

positional arguments:
  text                  Text to tokenize (or read from stdin if not provided)

options:
  --tokenizer           Tokenizer to use (o200k_base, cl100k_base)
  --format              Output format (json, plain, summary)
  --set-default-tokenizer  Set default tokenizer
  -h, --help           Show help message

Tokenizers

o200k_base (Default): used by GPT-4o, GPT-4o-mini; 200K vocab size
cl100k_base: used by GPT-4, GPT-3.5; 100K vocab size

Configuration

Tokker stores your preferences in ~/.config/tokker/tokenizer_config.json:

{
  "default_tokenizer": "o200k_base",
  "delimiter": "⎮"
}

Programmatic Usage

You can also use tokker in your Python code:

import tokker

# Count tokens
count = tokker.count_tokens("Hello world")
print(f"Token count: {count}")

# Full tokenization
result = tokker.tokenize_text("Hello world", "o200k_base")
print(result["token_count"])

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Issues and pull requests are welcome! Visit the GitHub repository.

Acknowledgments

OpenAI for the tiktoken library

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.9

Aug 9, 2025

0.3.8

Aug 7, 2025

0.3.7

Aug 7, 2025

0.3.6

Aug 7, 2025

0.3.5

Aug 6, 2025

0.3.4

Aug 1, 2025

0.2.1

Jul 31, 2025

0.2.0

Jul 29, 2025

This version

0.1.2

Jul 28, 2025

0.1.1

Jul 28, 2025

0.1.0

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokker-0.1.2.tar.gz (9.5 kB view details)

Uploaded Jul 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokker-0.1.2-py3-none-any.whl (9.7 kB view details)

Uploaded Jul 28, 2025 Python 3

File details

Details for the file tokker-0.1.2.tar.gz.

File metadata

Download URL: tokker-0.1.2.tar.gz
Upload date: Jul 28, 2025
Size: 9.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`1ea1529250db0bca1c41ffd612cbdc15dd276e1039a40328629aec65b90e2e56`
MD5	`908bc43a9b8e84c3c145dc476f62f900`
BLAKE2b-256	`b267e867b91f405a40d2f63a64ec58b09836896309b93b0d5f8778d92fbd0480`

See more details on using hashes here.

File details

Details for the file tokker-0.1.2-py3-none-any.whl.

File metadata

Download URL: tokker-0.1.2-py3-none-any.whl
Upload date: Jul 28, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for tokker-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31656be16fa063d82147791243ef5c3a6d630dd17ee09d983d6a521cf0262648`
MD5	`6086c2fbc651c487b3b607c7a031be46`
BLAKE2b-256	`d1bfa1b5518a95ba47ffb90cf5482ea5648396eb0af0914584bbb372075f5fe3`

See more details on using hashes here.

tokker 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tokker

Features

Installation

Main commands

Full output

Plain Text Output

Summary Statistics

Other Commands

Using Different Tokenizers

Set Default Tokenizer:

Other

Tokenizers

Configuration

Programmatic Usage

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes