A package to count tokens in input text using OpenAI's tiktoken library.
Project description
gptwc: wc for GPT tokens
The wc
utility counts words or characters. The gptwc
utility functions similarly but counts tokens.
Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.
Use gptwc
to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken
.
Installation
$ pip install gptwc
$ echo "Simple is better than complex." | gptwc
7
Example Usage
$ cat LICENSE | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165
$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470
curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085
$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201
$ cat README.md | pbcopy
$ gptwc -c
517
Options
usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]
Count tokens in text files using OpenAI's tiktoken library.
positional arguments:
FILE Text files to count tokens in
options:
-h, --help show this help message and exit
--files0-from F Read input from the files specified by NUL-terminated names in file F
--model MODEL Model name to use for tokenization (default: text-davinci-003)
-c, --clipboard Read input from the system clipboard
--version show program's version number and exit
Which Tokenizer Does Each Model Use?
From tiktoken/model.py
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",
"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",
"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",
"gpt2": "gpt2",
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gptwc-1.2.4.tar.gz
(3.7 kB
view details)
Built Distributions
gptwc-1.2.4-py3-none-any.whl
(4.2 kB
view details)
File details
Details for the file gptwc-1.2.4.tar.gz
.
File metadata
- Download URL: gptwc-1.2.4.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fbbfde019cf47438fe2fe14b432f4e41629f624efa7751afbaea8868b662a19 |
|
MD5 | 668571fd838a7489fad8756e52840128 |
|
BLAKE2b-256 | 3cf18f5ad8d84e2189fa6339004e82f7c70999f398381bf100d2070017a0e2f8 |
File details
Details for the file gptwc-1.2.4-py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.2.4-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c38e9a5ee38d8d41578c492ee07841e8bf0618820a049cfd9001a8607229380b |
|
MD5 | 236be27462455ae85ee11d4730ba5a03 |
|
BLAKE2b-256 | 773c7b4721583ec1d95d877c004de3d43122d0bdebdc22df3ea3e8fb92c79888 |
File details
Details for the file gptwc-1.2.4-py2.py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.2.4-py2.py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de4468efb821ca82166e5a8a9c80ad331b08f0f94397f24cf5b4a74f8bc015d6 |
|
MD5 | 10604f5ea13a8ae13c997d8dcae2de93 |
|
BLAKE2b-256 | c033da9346647937a77e95c32279db0a4b06028046e415e0f38be45ae9d0d49a |