Skip to main content

A simple utility to estimate token counts using tiktoken o200k_base

Project description

Tokentik

A lightweight Python utility for estimating token counts in text, specifically optimized for modern LLMs using the o200k_base encoding.

Installation

pip install tokentik

Usage

from tokentik import count_tokens

text = "Hello, world!"

# Estimate tokens using the default 'o200k_base' encoding (GPT-4o)
token_count = count_tokens(text)
print(f"Token count: {token_count}")

# Specify a different encoding model (e.g., 'cl100k_base' for GPT-4/GPT-3.5)
token_count_v2 = count_tokens(text, model="cl100k_base")
print(f"Token count (cl100k_base): {token_count_v2}")

Configuration

Environment Variables

tiktoken needs to download and cache the BPE (Byte Pair Encoding) vocabulary files. By default, it uses a temporary directory. To specify a persistent location for these files, set the TIKTOKEN_CACHE_DIR environment variable:

export TIKTOKEN_CACHE_DIR="/path/to/your/models/tiktoken"

This is highly recommended for production environments or Cloud Run environments where the storage might be mounted (e.g., at /mnt/models/tiktoken).

Acknowledgments

Special thanks to OpenAI for their tiktoken library, which this utility is built upon.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokentik-0.1.1.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokentik-0.1.1-py3-none-any.whl (3.2 kB view details)

Uploaded Python 3

File details

Details for the file tokentik-0.1.1.tar.gz.

File metadata

  • Download URL: tokentik-0.1.1.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tokentik-0.1.1.tar.gz
Algorithm Hash digest
SHA256 046d945b10dbe2aba6de128bd6cdc36b084a860f0b3db7265d74df3a9aac8703
MD5 5f9cce51b4fac66187b3908aa90e1604
BLAKE2b-256 1e5e4038ddc8415cf9a8c636275b4e4e8f5e501763182e22c2ac02d894711780

See more details on using hashes here.

File details

Details for the file tokentik-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tokentik-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for tokentik-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb37c1ec51ca739f207de548cfd6b35e7ca53f85eb4b60aac24742a16b4004b3
MD5 38aa3a3c4cb33f84b48d2dcb8716e631
BLAKE2b-256 a52605823d3af02fda51fba744b54e3810edb7d855251fffe1dd212b7752d354

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page