Skip to main content

Tokenizer for Text to Speech (TTS) models

Project description

ttstokenizer: Tokenizer for Text to Speech (TTS) models

Version

See the original repository https://github.com/Kyubyong/g2p for more information on English Grapheme to Phoneme conversion.

Other than removing unused dependencies and reorganizing the files, the original logic remains unchanged

ttstokenizer makes it easy to feed text to speech models with minimal dependencies that are Apache 2.0 compatible.

The standard preprocessing logic for many English Text to Speech (TTS) models is as follows:

  • Apply Tacotron text normalization rules
    • This project replicates the logic found in ESPnet
  • Convert Graphemes to Phonemes
  • Build an integer array mapping Phonemes to their integer token positions

This project adds a new tokenizer that runs the logic above. The output is consumable by machine learning models.

Installation

The easiest way to install is via pip and PyPI

pip install ttstokenizer

Usage

An example of tokenizing text for TTS models is shown below.

from ttstokenizer import TTSTokenizer

tokenizer = TTSTokenizer(tokens)
print(tokenizer("Text to tokenize"))

>>> array([ 4, 15, 10,  6,  4,  4, 28,  4, 34, 10,  2,  3, 51, 11])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttstokenizer-1.0.0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

ttstokenizer-1.0.0-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file ttstokenizer-1.0.0.tar.gz.

File metadata

  • Download URL: ttstokenizer-1.0.0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.26.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.15

File hashes

Hashes for ttstokenizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1ed0d70ca959de3a702206f90ef6aabab12445ef13449db1a8a97620a1259ca5
MD5 a240097956bdc8eb0df4913fa0a09e9a
BLAKE2b-256 aed673ffd1855f725775fef672d34a7b1b7e54b28083efbba20211bbb5e3ba3e

See more details on using hashes here.

File details

Details for the file ttstokenizer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ttstokenizer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.26.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.15

File hashes

Hashes for ttstokenizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c016502bbd2b3c6dee7cfba4b609900d5c66f7acf49d523ebff570ee909489b
MD5 853f5e8a3c0c87faa0c9baf62c621bc4
BLAKE2b-256 7a3df2e950b092796c1ca3c78271db1df9b2f81674dfb3741bfe17e94a041681

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page