Skip to main content

A package to count tokens in input text using OpenAI's tiktoken library.

Project description

gptwc: wc for GPT tokens

The wc utility counts words or characters. The gptwc utility functions similarly but counts tokens. Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.

Use gptwc to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken.

Installation

$ pip install gptwc

$ echo "Simple is better than complex." | gptwc
7

Example Usage

$ cat LICENSE  | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165


$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470

curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085


$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201


$ cat README.md | pbcopy
$ gptwc -c
517

Options

usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]

Count tokens in text files using OpenAI's tiktoken library.

positional arguments:
  FILE             Text files to count tokens in

options:
  -h, --help       show this help message and exit
  --files0-from F  Read input from the files specified by NUL-terminated names in file F
  --model MODEL    Model name to use for tokenization (default: gpt-4)
  -c, --clipboard  Read input from the system clipboard
  --version        show program's version number and exit

Which Tokenizer Does Each Model Use?

From tiktoken/model.py

"gpt-4o": "o200k_base",
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",

"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",

"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",

"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",

"gpt2": "gpt2",

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gptwc-1.3.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distributions

gptwc-1.3.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

gptwc-1.3.0-py2.py3-none-any.whl (4.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file gptwc-1.3.0.tar.gz.

File metadata

  • Download URL: gptwc-1.3.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.3.0.tar.gz
Algorithm Hash digest
SHA256 fddd6258451d38a12d22167bbc6bcc185a0d178a3ff7ab28b6d54b3cebfc15e7
MD5 c04be8ad14e0784f19bedb1acaa45c29
BLAKE2b-256 58a33364c0fa3ebe4655488d12708829de957bfe34dc324a9807de6f14012d98

See more details on using hashes here.

File details

Details for the file gptwc-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: gptwc-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c60fc581a2bcfa4c6f3ee16db985655e46cacbab90904946b6cbeb5bf355c000
MD5 bcb4875e9a244e9ac076d20ed27ea91e
BLAKE2b-256 bd6f17a64213187fbb1bdd83271979c49f763a04274e86d7eab7afba9da92fb0

See more details on using hashes here.

File details

Details for the file gptwc-1.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: gptwc-1.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a792b89f37e7370e3d367873c00a7b3babdb9b7ba0307a93d6b84f7357afed31
MD5 c98ac7fc7319301b366b0d308308695e
BLAKE2b-256 fb64864d73c0ef4f18add4850c7245767b5f4b99e02756b0542171676b5bc9db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page