Skip to main content

Count number of tokens in the text file using toktoken tokenizer from OpenAI.

Project description

Count tokens

img

Simple tool that have one purpose - count tokens in a text file.

Requirements

This package is using tiktoken library for tokenization.

Installation

For usage from comman line install the package in isolated environement with pipx:

$ pipx install count-tokens

or install it in your current environment with pip.

Usage

Open terminal and run:

$ count-tokens document.txt

You should see something like this:

File: document.txt
Encoding: cl100k_base
Number of tokens: 67

if you want to see just the tokens count run:

$ count-tokens document.txt --quiet

and the output will be:

67

NOTE: tiktoken supports three encodings used by OpenAI models:

Encoding name OpenAI models
cl100k_base gpt-4, gpt-3.5-turbo, text-embedding-ada-002
p50k_base Codex models, text-davinci-002, text-davinci-003
r50k_base (or gpt2) GPT-3 models like davinci

to use token-count with other than default cl100k_base encoding use the additional input argument -e or --encoding:

$ count-tokens document.txt -e r50k_base

Related Projects

  • tiktoken - tokenization library used by this package

Credits

Thanks to the authors of the tiktoken library for open sourcing their work.

License

MIT © Krystian Safjan.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

count_tokens-0.4.0.tar.gz (2.6 kB view hashes)

Uploaded Source

Built Distribution

count_tokens-0.4.0-py3-none-any.whl (3.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page