Count AI tokens in files and directories using tiktoken
Project description
aitokencount
A Python utility for counting AI tokens in all files within a folder and its subfolders using OpenAI's tiktoken library with the cl100k_base encoding.
Features
- Recursively processes all files in a directory and its subdirectories
- Automatically skips
.gitfolders to avoid processing repository metadata - Respects
.gitignorefiles if present, skipping ignored files and directories - Uses the cl100k_base encoding by default (same encoding used by models like GPT-4 and GPT-3.5-Turbo)
- Estimates costs for processing the tokens with different AI models (GPT-4o, Claude 3.7 Sonnet, etc.)
- Provides a summary of total tokens and processed files
- Handles errors gracefully
Installation
From PyPI (Recommended)
pip install aitokencount
From Source
- Clone this repository or download the files
- Set up a virtual environment:
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
- Install the package in development mode:
pip install -e .
Usage
Command Line
aitokencount /path/to/folder
Optional arguments:
# Specify a different encoding
aitokencount /path/to/folder --encoding cl100k_base
# Ignore .gitignore patterns even if a .gitignore file exists
aitokencount /path/to/folder --ignore-gitignore
# Suppress progress output, only show summary
aitokencount /path/to/folder --quiet
Python API
from aitokencount.core import count_tokens_in_folder
# Count tokens in a folder
results = count_tokens_in_folder('/path/to/folder', encoding_name='cl100k_base')
# Access the results
print(f"Total tokens: {results['total_tokens']}")
print(f"Files processed: {results['processed_files']}")
# Access cost estimates
for model, cost in results['cost_estimates'].items():
print(f"Cost for {model}: ${cost:.4f}")
Cost Estimation
The tool provides cost estimates for processing the tokens with different AI models:
| Model | Price per Million Tokens |
|---|---|
| GPT-4o | $10.00 |
| Claude 3.7 Sonnet | $15.00 |
These estimates help you understand the potential cost of processing your content with various AI models.
Example
python token_counter.py ./my_project
Output:
Counting tokens in /absolute/path/to/my_project using cl100k_base encoding...
Found .gitignore with 5 patterns. Will ignore matching files.
Skipping .git directory: /absolute/path/to/my_project/.git
Ignored (gitignore): /absolute/path/to/my_project/node_modules/package.json
Processed: /absolute/path/to/my_project/file1.txt - 150 tokens
Processed: /absolute/path/to/my_project/file2.py - 320 tokens
...
Summary:
Total tokens: 1250
Files processed: 15
Files with errors: 2
Files skipped (gitignore): 8
Files skipped (.git): 42
Estimated costs:
gpt-4o: $0.0125
claude-3-7-sonnet: $0.0188
Notes
- The script attempts to read all files as text files. Binary files or files with encoding issues may be skipped.
- The token count is based on the specified tiktoken encoding (cl100k_base by default).
.gitdirectories are automatically skipped to avoid processing repository metadata.- If a
.gitignorefile exists in the target directory, the script will automatically respect its patterns and skip ignored files and directories.
Requirements
- Python 3.6+
- tiktoken library
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aitokencount-0.1.1.tar.gz.
File metadata
- Download URL: aitokencount-0.1.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f1ee9cc59acd488a6ff7688781943c82eecb4e024ffb57b8fd28a03dcb8ab2a
|
|
| MD5 |
c12e8eb597437becb5974705a61fc247
|
|
| BLAKE2b-256 |
5f1affeda786c9306e9508baf6bbbc6045ce9b8432dd9ce58e26c263b4cd3bac
|
File details
Details for the file aitokencount-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aitokencount-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3d1f413704a227b2bbc272327f55726441c402eb5dc46f4a98915bf949edf0a
|
|
| MD5 |
5ab8ab78a88dced5ce0e2fcddc4170c4
|
|
| BLAKE2b-256 |
97fe9449189d653ccb6af705e0a66b2e41c987489700ae3d8b10bd013a0b0caa
|