A package to count tokens in input text using OpenAI's tiktoken library.
Project description
gptwc: wc for GPT tokens
The wc utility counts words or characters. The gptwc utility functions similarly but counts tokens.
Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.
Use gptwc to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken.
Installation
$ pip install gptwc
$ echo "Simple is better than complex." | gptwc
7
Example Usage
$ cat LICENSE | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165
$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470
curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085
$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201
$ cat README.md | pbcopy
$ gptwc -c
517
Options
usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]
Count tokens in text files using OpenAI's tiktoken library.
positional arguments:
FILE Text files to count tokens in
options:
-h, --help show this help message and exit
--files0-from F Read input from the files specified by NUL-terminated names in file F
--model MODEL Model name to use for tokenization (default: text-davinci-003)
-c, --clipboard Read input from the system clipboard
--version show program's version number and exit
Which Tokenizer Does Each Model Use?
From tiktoken/model.py
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",
"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",
"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",
"gpt2": "gpt2",
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gptwc-1.2.6.tar.gz.
File metadata
- Download URL: gptwc-1.2.6.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ebcddf4419eb7ed66a804997dbee0f4482fef017e36904b171e539e83a5662c
|
|
| MD5 |
52b5c66fad28523f15d70d78ea926e32
|
|
| BLAKE2b-256 |
bd7f01c3441ff7d6c627f33bb41d3268c8671bf1a862b1a363e26e6a2ba287e2
|
File details
Details for the file gptwc-1.2.6-py3-none-any.whl.
File metadata
- Download URL: gptwc-1.2.6-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db5bf468da8e6223c3f03938f97f047b6d4bf4bc0c851c35274ea713cccb06bc
|
|
| MD5 |
6aa182c5b0976d510c06b03cc844da5b
|
|
| BLAKE2b-256 |
3dbe3484e558febe5c10e585e086205af56d87ef8af12c06bc0f8f105ed80164
|
File details
Details for the file gptwc-1.2.6-py2.py3-none-any.whl.
File metadata
- Download URL: gptwc-1.2.6-py2.py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d29520030470cfde4c5eb60edf7d2e9fe5b855a4dab4f2f3867c282c8d23626a
|
|
| MD5 |
670079ca3fb185ef8e621d778a7d1412
|
|
| BLAKE2b-256 |
6dba352e6a69cb6e51f8e393fba488b19be74f5c43b87f348a10d4614bd30b4a
|