Skip to main content

Count and truncate whisper text based on tokens

Project description

wtok: rebalance training sets for whisper

PyPI Changelog Tests License

Count and truncate text based on tokens one sentence at a time

Background

Whisper models conditional distributions of a token given a sequence of past tokens

This tool can count tokens, using OpenAI's tiktoken library.

It can also truncate text to a specified number of tokens.

Installation

Install this tool using pip:

pip install wtok

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd wtok
python -m venv venv
source venv/bin/activate

Now install for editing:

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wtok-0.6.1.tar.gz (8.5 kB view hashes)

Uploaded Source

Built Distribution

wtok-0.6.1-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page