Skip to main content

Count AI tokens in files and directories using tiktoken

Project description

aitokencount

A Python utility for counting AI tokens in all files within a folder and its subfolders using OpenAI's tiktoken library with the cl100k_base encoding.

PyPI version License: MIT

Features

  • Recursively processes all files in a directory and its subdirectories
  • Automatically skips .git folders to avoid processing repository metadata
  • Respects .gitignore files if present, skipping ignored files and directories
  • Uses the cl100k_base encoding by default (same encoding used by models like GPT-4 and GPT-3.5-Turbo)
  • Estimates costs for processing the tokens with different AI models (GPT-4o, Claude 3.7 Sonnet, etc.)
  • Provides a summary of total tokens and processed files
  • Handles errors gracefully

Installation

From PyPI (Recommended)

pip install aitokencount

From Source

  1. Clone this repository or download the files
  2. Set up a virtual environment:
# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
  1. Install the package in development mode:
pip install -e .

Usage

Command Line

aitokencount /path/to/folder

Optional arguments:

# Specify a different encoding
aitokencount /path/to/folder --encoding cl100k_base

# Ignore .gitignore patterns even if a .gitignore file exists
aitokencount /path/to/folder --ignore-gitignore

# Suppress progress output, only show summary
aitokencount /path/to/folder --quiet

Python API

from aitokencount.core import count_tokens_in_folder

# Count tokens in a folder
results = count_tokens_in_folder('/path/to/folder', encoding_name='cl100k_base')

# Access the results
print(f"Total tokens: {results['total_tokens']}")
print(f"Files processed: {results['processed_files']}")

# Access cost estimates
for model, cost in results['cost_estimates'].items():
    print(f"Cost for {model}: ${cost:.4f}")

Cost Estimation

The tool provides cost estimates for processing the tokens with different AI models:

Model Price per Million Tokens
GPT-4o $10.00
Claude 3.7 Sonnet $15.00

These estimates help you understand the potential cost of processing your content with various AI models.

Example

python token_counter.py ./my_project

Output:

Counting tokens in /absolute/path/to/my_project using cl100k_base encoding...
Found .gitignore with 5 patterns. Will ignore matching files.
Skipping .git directory: /absolute/path/to/my_project/.git
Ignored (gitignore): /absolute/path/to/my_project/node_modules/package.json
Processed: /absolute/path/to/my_project/file1.txt - 150 tokens
Processed: /absolute/path/to/my_project/file2.py - 320 tokens
...

Summary:
Total tokens: 1250
Files processed: 15
Files with errors: 2
Files skipped (gitignore): 8
Files skipped (.git): 42

Estimated costs:
  gpt-4o: $0.0125
  claude-3-7-sonnet: $0.0188

Notes

  • The script attempts to read all files as text files. Binary files or files with encoding issues may be skipped.
  • The token count is based on the specified tiktoken encoding (cl100k_base by default).
  • .git directories are automatically skipped to avoid processing repository metadata.
  • If a .gitignore file exists in the target directory, the script will automatically respect its patterns and skip ignored files and directories.

Requirements

  • Python 3.6+
  • tiktoken library

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aitokencount-0.1.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aitokencount-0.1.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file aitokencount-0.1.1.tar.gz.

File metadata

  • Download URL: aitokencount-0.1.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for aitokencount-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0f1ee9cc59acd488a6ff7688781943c82eecb4e024ffb57b8fd28a03dcb8ab2a
MD5 c12e8eb597437becb5974705a61fc247
BLAKE2b-256 5f1affeda786c9306e9508baf6bbbc6045ce9b8432dd9ce58e26c263b4cd3bac

See more details on using hashes here.

File details

Details for the file aitokencount-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aitokencount-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for aitokencount-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3d1f413704a227b2bbc272327f55726441c402eb5dc46f4a98915bf949edf0a
MD5 5ab8ab78a88dced5ce0e2fcddc4170c4
BLAKE2b-256 97fe9449189d653ccb6af705e0a66b2e41c987489700ae3d8b10bd013a0b0caa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page