A professional CLI tool for counting AI model tokens in code projects

These details have not been verified by PyPI

Project links

Project description

Code Tokenizer

License: MIT Python Version PyPI Version Build Status Code Style

A simple command-line tool to quickly calculate AI model Token usage for entire projects, helping you determine if your project is suitable for direct AI analysis.

Modern LLM models (like GPT-4 Turbo, Claude-4) have context lengths of 200k+, which can load entire project codebases at once. If your project's total code Token count is less than 200k, you can submit the entire project to LLM models for analysis at once, rather than having the model read files one by one. This tool provides a one-click feature to package all code into a single file, making this process easy.

🎯 Features

Token Statistics - Accurately calculate Token counts for your entire project's code across different AI models
Context Analysis - Display the percentage of each AI model's context window used by your project to determine if it exceeds limits
One-Click Packaging - Merge all code files into a single file for easy one-time submission to AI models
Smart Filtering - Automatically exclude irrelevant files (node_modules, .git, etc.) and keep only core code
GitIgnore Integration - Automatically read and apply .gitignore rules to filter out ignored files and folders

📦 Installation

pip install code-tokenizer

🚀 Usage

# Count Tokens for current project (automatically applies .gitignore rules)
code-tokenizer

# Count Tokens for specified project
code-tokenizer /path/to/project

# Count and package all code into a single file
code-tokenizer --package my_project.txt

# Show only the top 5 largest files
code-tokenizer --max-show 5

# Disable automatic .gitignore rule integration
code-tokenizer --no-gitignore

📊 Example Output

Code Tokenizer Output

🔧 Supported File Types

Go, Python, JavaScript, TypeScript, Java, C/C++, Swift, Kotlin, PHP, Ruby, Vue, HTML, CSS, YAML, JSON, XML, SQL, Shell scripts, Markdown, and more

⚠️ Disclaimer

This project is developed based on OpenAI tiktoken. Token count results are for reference only and may vary due to tokenizer differences across AI models.

Privacy Protection: This project runs locally only and does not upload any code information to external servers, protecting your code privacy and security.

📄 License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.3

Nov 5, 2025

1.0.2

Nov 5, 2025

1.0.1

Nov 4, 2025

1.0.0

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_tokenizer-1.0.3.tar.gz (246.1 kB view details)

Uploaded Nov 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

code_tokenizer-1.0.3-py3-none-any.whl (22.3 kB view details)

Uploaded Nov 5, 2025 Python 3

File details

Details for the file code_tokenizer-1.0.3.tar.gz.

File metadata

Download URL: code_tokenizer-1.0.3.tar.gz
Upload date: Nov 5, 2025
Size: 246.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.7

File hashes

Hashes for code_tokenizer-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`910ccfa51a48979ea93881d56c95ef339d4676e0e1d267fc64ac311f1d08b2c1`
MD5	`20c9f76a6391f1991a0689ef86260d98`
BLAKE2b-256	`9a364b4d090a76e69e8be764a80d81e47ecb5aa75e376f554573a018758577fd`

See more details on using hashes here.

File details

Details for the file code_tokenizer-1.0.3-py3-none-any.whl.

File metadata

Download URL: code_tokenizer-1.0.3-py3-none-any.whl
Upload date: Nov 5, 2025
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.7

File hashes

Hashes for code_tokenizer-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7624a7ba1f9472c4b6967106be64c0603a89def19e7fdf632527510aceb8b049`
MD5	`c7e7ed0b756d8135a38e0da8f117ddd6`
BLAKE2b-256	`d5e029e4f25b4eb0f40b5c6be7e181017312a4bc8337dc3cc51b8756285d04de`

See more details on using hashes here.

code-tokenizer 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Code Tokenizer

🎯 Features

📦 Installation

🚀 Usage

📊 Example Output

🔧 Supported File Types

⚠️ Disclaimer

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes