Skip to main content

Count and truncate whisper text based on tokens

Project description

wtok: rebalance training sets for whisper

PyPI Changelog Tests License

Count and truncate text based on tokens one sentence at a time

Background

Whisper models conditional distributions of a token given a sequence of past tokens

This tool can count tokens, using OpenAI's tiktoken library.

It can also truncate text to a specified number of tokens.

Installation

Install this tool using pip:

pip install wtok

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd wtok
python -m venv venv
source venv/bin/activate

Now install for editing:

pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wtok-0.6.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

wtok-0.6.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file wtok-0.6.1.tar.gz.

File metadata

  • Download URL: wtok-0.6.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for wtok-0.6.1.tar.gz
Algorithm Hash digest
SHA256 1818c7ed042c363a875c2003d20f7074ee7e3a36afbbd90bc2c0793390edabb5
MD5 8b0b75d6fa2d744a5d8f6148f1373a2e
BLAKE2b-256 74b5f6a7cedf474bfc54d15382069e77ebc5ac584347ccc2e02f9ec291dd7abd

See more details on using hashes here.

File details

Details for the file wtok-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: wtok-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for wtok-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd183662b7ee2d16551be004c1a365ff865f7d065f8ac8ac0a86670514953905
MD5 f5687019b7fd3147c98dc1cb4248624e
BLAKE2b-256 5a89004e54afdded7ec9fd8fd1f7d3f739257a36187514a8bede7a8ca08855e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page