Skip to main content

A package to count tokens in input text using OpenAI's tiktoken library.

Project description

gptwc: wc for GPT tokens

The wc utility counts words or characters. The gptwc utility functions similarly but counts tokens. Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.

Use gptwc to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken.

Installation

$ pip install gptwc

$ echo "Simple is better than complex." | gptwc
7

Example Usage

$ cat LICENSE  | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165


$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470

curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085


$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201


$ cat README.md | pbcopy
$ gptwc -c
517

Options

usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]

Count tokens in text files using OpenAI's tiktoken library.

positional arguments:
  FILE             Text files to count tokens in

options:
  -h, --help       show this help message and exit
  --files0-from F  Read input from the files specified by NUL-terminated names in file F
  --model MODEL    Model name to use for tokenization (default: text-davinci-003)
  -c, --clipboard  Read input from the system clipboard
  --version        show program's version number and exit

Which Tokenizer Does Each Model Use?

From tiktoken/model.py

"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",

"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",

"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",

"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",

"gpt2": "gpt2",

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gptwc-1.2.6.tar.gz (3.9 kB view details)

Uploaded Source

Built Distributions

gptwc-1.2.6-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

gptwc-1.2.6-py2.py3-none-any.whl (4.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file gptwc-1.2.6.tar.gz.

File metadata

  • Download URL: gptwc-1.2.6.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.2.6.tar.gz
Algorithm Hash digest
SHA256 9ebcddf4419eb7ed66a804997dbee0f4482fef017e36904b171e539e83a5662c
MD5 52b5c66fad28523f15d70d78ea926e32
BLAKE2b-256 bd7f01c3441ff7d6c627f33bb41d3268c8671bf1a862b1a363e26e6a2ba287e2

See more details on using hashes here.

File details

Details for the file gptwc-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: gptwc-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 db5bf468da8e6223c3f03938f97f047b6d4bf4bc0c851c35274ea713cccb06bc
MD5 6aa182c5b0976d510c06b03cc844da5b
BLAKE2b-256 3dbe3484e558febe5c10e585e086205af56d87ef8af12c06bc0f8f105ed80164

See more details on using hashes here.

File details

Details for the file gptwc-1.2.6-py2.py3-none-any.whl.

File metadata

  • Download URL: gptwc-1.2.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for gptwc-1.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d29520030470cfde4c5eb60edf7d2e9fe5b855a4dab4f2f3867c282c8d23626a
MD5 670079ca3fb185ef8e621d778a7d1412
BLAKE2b-256 6dba352e6a69cb6e51f8e393fba488b19be74f5c43b87f348a10d4614bd30b4a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page