A package to count tokens in input text using OpenAI's tiktoken library.
Project description
gptwc: wc for GPT tokens
The wc
utility counts words or characters. The gptwc
utility functions similarly but counts tokens.
Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.
Use gptwc
to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken
.
Installation
$ pip install gptwc
$ echo "Simple is better than complex." | gptwc
7
Example Usage
$ cat LICENSE | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165
$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470
curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085
$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201
$ cat README.md | pbcopy
$ gptwc -c
517
Options
usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]
Count tokens in text files using OpenAI's tiktoken library.
positional arguments:
FILE Text files to count tokens in
options:
-h, --help show this help message and exit
--files0-from F Read input from the files specified by NUL-terminated names in file F
--model MODEL Model name to use for tokenization (default: gpt-4)
-c, --clipboard Read input from the system clipboard
--version show program's version number and exit
Which Tokenizer Does Each Model Use?
From tiktoken/model.py
"gpt-4o": "o200k_base",
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",
"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",
"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",
"gpt2": "gpt2",
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gptwc-1.3.0.tar.gz
(3.9 kB
view details)
Built Distributions
gptwc-1.3.0-py3-none-any.whl
(4.4 kB
view details)
File details
Details for the file gptwc-1.3.0.tar.gz
.
File metadata
- Download URL: gptwc-1.3.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fddd6258451d38a12d22167bbc6bcc185a0d178a3ff7ab28b6d54b3cebfc15e7 |
|
MD5 | c04be8ad14e0784f19bedb1acaa45c29 |
|
BLAKE2b-256 | 58a33364c0fa3ebe4655488d12708829de957bfe34dc324a9807de6f14012d98 |
File details
Details for the file gptwc-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.3.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c60fc581a2bcfa4c6f3ee16db985655e46cacbab90904946b6cbeb5bf355c000 |
|
MD5 | bcb4875e9a244e9ac076d20ed27ea91e |
|
BLAKE2b-256 | bd6f17a64213187fbb1bdd83271979c49f763a04274e86d7eab7afba9da92fb0 |
File details
Details for the file gptwc-1.3.0-py2.py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.3.0-py2.py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a792b89f37e7370e3d367873c00a7b3babdb9b7ba0307a93d6b84f7357afed31 |
|
MD5 | c98ac7fc7319301b366b0d308308695e |
|
BLAKE2b-256 | fb64864d73c0ef4f18add4850c7245767b5f4b99e02756b0542171676b5bc9db |