Skip to main content

Count and truncate text based on tokens

Project description

ttok

PyPI Changelog Tests License

Count and truncate text based on tokens

Background

Large language models such as GPT-3.5 and GPT-4 work in terms of tokens.

This tool can count tokens, using OpenAI's tiktoken library.

It can also truncate text to a specified number of tokens.

Installation

Install this tool using pip:

pip install ttok

Counting tokens

Provide text as arguments to this tool to count tokens:

ttok Hello world
2

You can also pipe text into the tool:

echo -n "Hello world" -n | ttok
2

Here the echo -n option prevents echo from adding a newline - without that you would get a token count of 3.

To pipe in text and then append extra tokens from arguments, use the -i - option:

echo -n "Hello world" -n | ttok more text -i -
6

Different models

By default, the tokenizer model for GPT-3.5 and GPT-4 is used.

To use the model for GPT-2 and GPT-3, add --model gpt2:

ttok boo Hello there this is -m gpt2
6

Compared to GPT-3.5:

ttok boo Hello there this is
5

Further model options are documented here.

Truncating text

Use the -t 10 or --truncate 10 option to truncate text to a specified number of tokens:

ttok This is too many tokens -t 3
This is too

Viewing tokens

The --tokens option can be used to view the integer token IDs for the incoming text:

ttok Hello world --tokens
9906 1917

ttok --help

Usage: ttok [OPTIONS] [PROMPT]...

  Count and truncate text based on tokens

  To count tokens for text passed as arguments:

      ttok one two three

  To count tokens from stdin:

      cat input.txt | ttok

  To truncate to 100 tokens:

      cat input.txt | ttok -t 100

  To truncate to 100 tokens using the gpt2 model:

      cat input.txt | ttok -t 100 -m gpt2

  To view tokens:

      cat input.txt | ttok --tokens

Options:
  --version               Show the version and exit.
  -i, --input FILENAME
  -t, --truncate INTEGER  Truncate to this many tokens
  -m, --model TEXT        Which model to use
  --tokens                Output token integers
  --help                  Show this message and exit.

You can also run this command using:

python -m ttok --help

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd ttok
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttok-0.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

ttok-0.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file ttok-0.1.tar.gz.

File metadata

  • Download URL: ttok-0.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for ttok-0.1.tar.gz
Algorithm Hash digest
SHA256 8b9439f71be222e117392b2c8b38bb94e416341ccb938191a56589ec3353951a
MD5 0c2c6cd144b1dd202d7c92b0378682fb
BLAKE2b-256 d8d38f895865f5aa0bb38130383ac103c4e4204b93fdff7a1a301d5e81b2c588

See more details on using hashes here.

File details

Details for the file ttok-0.1-py3-none-any.whl.

File metadata

  • Download URL: ttok-0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for ttok-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bfa88dca85e83e04ea50cff13b18dd3525fe94cfc18a3cf70f373fa6cbaa29d0
MD5 66b73b0a375107d56ac627a6ad43920c
BLAKE2b-256 536db1e5db1504393c0712ea9dbf6e814a5fa9a3e6a9c76120faa81fe94b2f32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page