Count number of tokens in text files using tiktoken tokenizer from OpenAI

These details have not been verified by PyPI

Project links

Project description

Count tokens

A versatile tool for counting tokens in text files, directories, and strings with support for streaming large files, batching, and more.

Count tokens

Requirements

This package is using tiktoken library for tokenization.

Installation

For usage from command line install the package in isolated environment with pipx:

pipx install count-tokens

or run it with uv without installing:

uvx count-tokens document.txt

or install it in your current environment with pip.

pip install count-tokens

Usage

Basic Usage

Open terminal and run:

count-tokens document.txt

You should see something like this:

File: document.txt
Encoding: cl100k_base
Number of tokens: 67

if you want to see just the tokens count run:

count-tokens document.txt --quiet

and the output will be:

To use count-tokens with other than default cl100k_base encoding use the additional input argument -e or --encoding:

count-tokens document.txt -e r50k_base

NOTE: tiktoken supports three encodings used by OpenAI models:

Encoding name	OpenAI models
`o200k_base`	`gpt-4o`, `gpt-4o-mini`
`cl100k_base`	`gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`
`p50k_base`	Codex models, `text-davinci-002`, `text-davinci-003`
`r50k_base` (or `gpt2`)	GPT-3 models like `davinci`

(source: OpenAI Cookbook)

Directory Processing

Process all files in a directory matching specific patterns:

count-tokens -d ./docs -p "*.md,*.txt"

If -p is not specified, the default patterns are *.txt,*.py,*.md.

Process directories recursively:

count-tokens -d ./project -r -p "*.py"

Large File Support

Use streaming mode for large files to avoid memory issues:

count-tokens large_file.txt --stream

Customize chunk size for streaming (default is 1MB):

count-tokens large_file.txt --stream --chunk-size 2097152

Output Formats

Get results in different formats:

# JSON format
count-tokens -d ./docs -p "*.md" --format json

# CSV format
count-tokens -d ./docs -p "*.md" --format csv

Token Limit Checking

Check if files exceed a specific token limit:

count-tokens document.txt --max-tokens 4096

When files exceed the limit, you'll see a warning:

File: document.txt
Encoding: cl100k_base
⚠️ Token limit exceeded: 5120 > 4096
Number of tokens: 5120

Approximate number of tokens

In case you need the results a bit faster and you don't need the exact number of tokens you can use the --approx parameter with w to have approximation based on number of words or c to have approximation based on number of characters.

count-tokens document.txt --approx w

It is based on assumption that there is 4/3 (1 and 1/3) tokens per word and 4 characters per token.

Adjusting estimation rules

You can customize the rules used for token estimation by adjusting the default values for tokens per word and characters per token ratios:

# Adjust the tokens per word ratio (default is 1.33)
count-tokens document.txt --approx w --tokens-per-word 1.5

# Adjust the characters per token ratio (default is 4.0)
count-tokens document.txt --approx c --characters-per-token 3.5

These options allow you to fine-tune the approximation based on your specific content characteristics.

Programmatic usage

Simple API

The package now provides a simplified API for all token counting operations:

from count_tokens import count

# Count tokens in a string
result = count(text="This is a string")

# Count tokens in a file
result = count(file="document.txt", encoding="cl100k_base")

# Count tokens with approximation
result = count(file="document.txt", approximate="w", tokens_per_word=1.5)

Directory Processing

Process all files in a directory that match specific patterns:

from count_tokens import count

# Process a directory
results = count(
    directory="./docs",
    file_patterns=["*.md", "*.txt"],
    recursive=True
)

# Print results
for file_path, token_count in results.items():
    print(f"{file_path}: {token_count} tokens")

Streaming Large Files

Process large files without loading the entire file into memory:

from count_tokens import count

# Process a large file with streaming
tokens = count(
    file="large_dataset.txt", 
    use_streaming=True,
    chunk_size=1024*1024  # 1MB chunks
)

Check Token Limits

Check if content exceeds token limits:

from count_tokens import count

# Check if a file exceeds token limit
result = count(file="document.txt", max_tokens=4096)

if isinstance(result, dict) and result.get("limit_exceeded"):
    print(f"⚠️ Token limit exceeded: {result['tokens']} > {result['max_tokens']}")

Original API

The original functions are still available for backward compatibility:

from count_tokens.count import count_tokens_in_file, count_tokens_in_string

# Count tokens in a file
num_tokens = count_tokens_in_file("document.txt")

# Count tokens in a string
num_tokens = count_tokens_in_string("This is a string.")

# Use specific encoding
num_tokens = count_tokens_in_string("This is a string.", encoding_name="cl100k_base")

# Word-based approximation with custom tokens per word ratio
num_tokens = count_tokens_in_file("document.txt", approximate="w", tokens_per_word=1.5)

# Character-based approximation with custom characters per token ratio
num_tokens = count_tokens_in_file("document.txt", approximate="c", characters_per_token=3.5)

Related Projects

tiktoken - tokenization library used by this package
ttok - count and truncate text based on tokens

Credits

Thanks to the authors of the tiktoken library for open sourcing their work.

License

MIT © Krystian Safjan.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.3

Jun 18, 2026

0.8.2

Apr 19, 2026

0.8.1

Feb 3, 2026

0.7.3

Dec 12, 2025

0.7.2

Jan 9, 2025

0.7.0

Sep 26, 2023

0.6.0

Sep 26, 2023

0.5.0

Sep 26, 2023

0.4.0

Jul 1, 2023

0.3.0

Jun 28, 2023

0.2.0

Jun 28, 2023

0.1.0

Jun 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

count_tokens-0.8.3.tar.gz (41.4 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

count_tokens-0.8.3-py3-none-any.whl (10.8 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file count_tokens-0.8.3.tar.gz.

File metadata

Download URL: count_tokens-0.8.3.tar.gz
Upload date: Jun 18, 2026
Size: 41.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for count_tokens-0.8.3.tar.gz
Algorithm	Hash digest
SHA256	`0305add445812b9777a3fe758e5bd3d0d5da1a39557916079e14ab12169c4236`
MD5	`a92d8134e73234796fc373ec14231387`
BLAKE2b-256	`c58970e78ea36533f98c41b21f63dea822bf0bd91f67ad80429c9f657280d8d7`

See more details on using hashes here.

File details

Details for the file count_tokens-0.8.3-py3-none-any.whl.

File metadata

Download URL: count_tokens-0.8.3-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for count_tokens-0.8.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`addd734f1a4c1478e8b0aec75d3aae65c8a3a7343f47e81ed8f2b1aacbc98d99`
MD5	`bd909f84ad02a2306f9628153f564d79`
BLAKE2b-256	`2a6843f625febdcb97b2d8f286e3ee98fd503669cb89433ab4c5307b58a77779`

See more details on using hashes here.

count-tokens 0.8.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Count tokens

Table of Contents

Requirements

Installation

Usage

Basic Usage

Directory Processing

Large File Support

Output Formats

Token Limit Checking

Approximate number of tokens

Adjusting estimation rules

Programmatic usage

Simple API

Directory Processing

Streaming Large Files

Check Token Limits

Original API

Related Projects

Credits

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes