A simple utility to estimate token counts using tiktoken o200k_base
Project description
Tokentik
A lightweight Python utility for estimating token counts in text, specifically optimized for modern LLMs using the o200k_base encoding.
Installation
pip install tokentik
Usage
from tokentik import count_tokens
text = "Hello, world!"
# Estimate tokens using the default 'o200k_base' encoding (GPT-4o)
token_count = count_tokens(text)
print(f"Token count: {token_count}")
# Specify a different encoding model (e.g., 'cl100k_base' for GPT-4/GPT-3.5)
token_count_v2 = count_tokens(text, model="cl100k_base")
print(f"Token count (cl100k_base): {token_count_v2}")
Configuration
Environment Variables
tiktoken needs to download and cache the BPE (Byte Pair Encoding) vocabulary files. By default, it uses a temporary directory. To specify a persistent location for these files, set the TIKTOKEN_CACHE_DIR environment variable:
export TIKTOKEN_CACHE_DIR="/path/to/your/models/tiktoken"
This is highly recommended for production environments or Cloud Run environments where the storage might be mounted (e.g., at /mnt/models/tiktoken).
Acknowledgments
Special thanks to OpenAI for their tiktoken library, which this utility is built upon.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokentik-0.1.1.tar.gz.
File metadata
- Download URL: tokentik-0.1.1.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
046d945b10dbe2aba6de128bd6cdc36b084a860f0b3db7265d74df3a9aac8703
|
|
| MD5 |
5f9cce51b4fac66187b3908aa90e1604
|
|
| BLAKE2b-256 |
1e5e4038ddc8415cf9a8c636275b4e4e8f5e501763182e22c2ac02d894711780
|
File details
Details for the file tokentik-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tokentik-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb37c1ec51ca739f207de548cfd6b35e7ca53f85eb4b60aac24742a16b4004b3
|
|
| MD5 |
38aa3a3c4cb33f84b48d2dcb8716e631
|
|
| BLAKE2b-256 |
a52605823d3af02fda51fba744b54e3810edb7d855251fffe1dd212b7752d354
|