A package to count tokens in input text using OpenAI's tiktoken library.
Project description
gptwc: wc for GPT tokens
The wc
utility counts words or characters. The gptwc
utility functions similarly but counts tokens.
Tokens are smaller than words but larger than characters, and are a more compact representation of text used by large language models.
Use gptwc
to check the number of tokens in a string, in order to remain under the token limit (eg. 4097) for your large language model API. Uses tiktoken
.
Installation
$ pip install gptwc
$ echo "Simple is better than complex." | gptwc
7
Example Usage
$ cat LICENSE | gptwc
257
$ cat LICENSE | wc -c
1059
$ cat LICENSE | wc -w
165
$ curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | wc -w
26470
curl -s 'https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt' | gptwc
40085
$ cat LICENSE | gptwc --model text-davinci-003
257
$ cat LICENSE | gptwc --model gpt-3.5-turbo
201
$ cat README.md | pbcopy
$ gptwc -c
517
Options
usage: gptwc [-h] [--files0-from F] [--model MODEL] [-c] [--version] [FILE ...]
Count tokens in text files using OpenAI's tiktoken library.
positional arguments:
FILE Text files to count tokens in
options:
-h, --help show this help message and exit
--files0-from F Read input from the files specified by NUL-terminated names in file F
--model MODEL Model name to use for tokenization (default: text-davinci-003)
-c, --clipboard Read input from the system clipboard
--version show program's version number and exit
Which Tokenizer Does Each Model Use?
From tiktoken/model.py
"gpt-4": "cl100k_base",
"gpt-3.5-turbo": "cl100k_base",
"text-embedding-ada-002": "cl100k_base",
"text-davinci-003": "p50k_base",
"text-davinci-002": "p50k_base",
"code-davinci-002": "p50k_base",
"code-davinci-001": "p50k_base",
"code-cushman-002": "p50k_base",
"code-cushman-001": "p50k_base",
"davinci-codex": "p50k_base",
"cushman-codex": "p50k_base",
"text-davinci-001": "r50k_base",
"text-curie-001": "r50k_base",
"text-babbage-001": "r50k_base",
"text-ada-001": "r50k_base",
"davinci": "r50k_base",
"curie": "r50k_base",
"babbage": "r50k_base",
"ada": "r50k_base",
"text-similarity-davinci-001": "r50k_base",
"text-similarity-curie-001": "r50k_base",
"text-similarity-babbage-001": "r50k_base",
"text-similarity-ada-001": "r50k_base",
"text-search-davinci-doc-001": "r50k_base",
"text-search-curie-doc-001": "r50k_base",
"text-search-babbage-doc-001": "r50k_base",
"text-search-ada-doc-001": "r50k_base",
"code-search-babbage-code-001": "r50k_base",
"code-search-ada-code-001": "r50k_base",
"text-davinci-edit-001": "p50k_edit",
"code-davinci-edit-001": "p50k_edit",
"gpt2": "gpt2",
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gptwc-1.2.5.tar.gz
(3.7 kB
view details)
Built Distributions
gptwc-1.2.5-py3-none-any.whl
(4.2 kB
view details)
File details
Details for the file gptwc-1.2.5.tar.gz
.
File metadata
- Download URL: gptwc-1.2.5.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58a3a4efba982c29527e1f2ffc3e530652640d7bbb7773c96deb0173eeadcf6d |
|
MD5 | 3dd7b326612c9c11dff702ca9a2c7214 |
|
BLAKE2b-256 | c7d844fc2014440dd0e1de2b3138d7dc74f0773cb9fbea2312b79912d1680ae0 |
File details
Details for the file gptwc-1.2.5-py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.2.5-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abd0698f6ac1a485ea60ae829e28a05e6d3212cd1251c89460af99426b0b994c |
|
MD5 | 1458f91c8b3eaf4ca36bb4b279a0f246 |
|
BLAKE2b-256 | e75a9c32f9dfb9b9a4702f5125939f1acc52d9f0b7dacb9d76f30bad2e4fbc4b |
File details
Details for the file gptwc-1.2.5-py2.py3-none-any.whl
.
File metadata
- Download URL: gptwc-1.2.5-py2.py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61fcc90a694bf05fa6f9c1cc1ca6c915c9a04331dcbe2d783b1bda13c0423976 |
|
MD5 | 4904f453a51aa9847c2bcfd8dd13650d |
|
BLAKE2b-256 | 562cd67ffa2a42b60ce279b68ae3014c10a33a0c0ed45035b49d7ae270342e37 |