Skip to main content

A professional CLI tool for counting AI model tokens in code projects

Project description

Code Tokenizer

Language: English | 中文

License: MIT Python Version PyPI Version Build Status Code Style

A simple command-line tool to quickly calculate AI model Token usage for entire projects, helping you determine if your project is suitable for direct AI analysis.

Modern LLM models (like GPT-4 Turbo, Claude-4) have context lengths of 200k+, which can load entire project codebases at once. If your project's total code Token count is less than 200k, you can submit the entire project to LLM models for analysis at once, rather than having the model read files one by one. This tool provides a one-click feature to package all code into a single file, making this process easy.

🎯 Features

  • Token Statistics - Accurately calculate Token counts for your entire project's code across different AI models
  • Context Analysis - Display the percentage of each AI model's context window used by your project to determine if it exceeds limits
  • One-Click Packaging - Merge all code files into a single file for easy one-time submission to AI models
  • Smart Filtering - Automatically exclude irrelevant files (node_modules, .git, etc.) and keep only core code
  • GitIgnore Integration - Automatically read and apply .gitignore rules to filter out ignored files and folders

📦 Installation

pip install code-tokenizer

🚀 Usage

# Count Tokens for current project (automatically applies .gitignore rules)
code-tokenizer

# Count Tokens for specified project
code-tokenizer /path/to/project

# Count and package all code into a single file
code-tokenizer --package my_project.txt

# Show only the top 5 largest files
code-tokenizer --max-show 5

# Disable automatic .gitignore rule integration
code-tokenizer --no-gitignore

📊 Example Output

Code Tokenizer Output

🔧 Supported File Types

Go, Python, JavaScript, TypeScript, Java, C/C++, Swift, Kotlin, PHP, Ruby, Vue, HTML, CSS, YAML, JSON, XML, SQL, Shell scripts, Markdown, and more

⚠️ Disclaimer

This project is developed based on OpenAI tiktoken. Token count results are for reference only and may vary due to tokenizer differences across AI models.

Privacy Protection: This project runs locally only and does not upload any code information to external servers, protecting your code privacy and security.

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_tokenizer-1.0.3.tar.gz (246.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_tokenizer-1.0.3-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file code_tokenizer-1.0.3.tar.gz.

File metadata

  • Download URL: code_tokenizer-1.0.3.tar.gz
  • Upload date:
  • Size: 246.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for code_tokenizer-1.0.3.tar.gz
Algorithm Hash digest
SHA256 910ccfa51a48979ea93881d56c95ef339d4676e0e1d267fc64ac311f1d08b2c1
MD5 20c9f76a6391f1991a0689ef86260d98
BLAKE2b-256 9a364b4d090a76e69e8be764a80d81e47ecb5aa75e376f554573a018758577fd

See more details on using hashes here.

File details

Details for the file code_tokenizer-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for code_tokenizer-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7624a7ba1f9472c4b6967106be64c0603a89def19e7fdf632527510aceb8b049
MD5 c7e7ed0b756d8135a38e0da8f117ddd6
BLAKE2b-256 d5e029e4f25b4eb0f40b5c6be7e181017312a4bc8337dc3cc51b8756285d04de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page