Count and truncate whisper text based on tokens
Project description
wtok: rebalance training sets for whisper
Count and truncate text based on tokens one sentence at a time
Background
Whisper models conditional distributions of a token given a sequence of past tokens
This tool can count tokens, using OpenAI's tiktoken library.
It can also truncate text to a specified number of tokens.
Installation
Install this tool using pip
:
pip install wtok
Development
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd wtok
python -m venv venv
source venv/bin/activate
Now install for editing:
pip install -e .
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wtok-0.6.1.tar.gz
(8.5 kB
view hashes)
Built Distribution
wtok-0.6.1-py3-none-any.whl
(9.0 kB
view hashes)