Skip to main content

TTS

Project description

kokoro

This WIP repository is intended to be an inference library for https://hf.co/hexgrad/Kokoro-82M

It is under construction and likely will not be useful until the next base model release.

The goal is to be able to pip install kokoro and offer some of the design goals and functionalities laid out below.

G2P will be imported from Misaki

Misaki is a G2P engine with language-specific solutions:

pip install misaki[en] # installs English
pip install misaki[ja] # installs Japanese

Users who don't peek under the hood may not care, since import kokoro will simply import misaki and life goes on. This is likely the proper separation of responsibilities, and not all users will want or need all languages.

Smarter LF chunking

Kokoro models have a 512 token context window, which usually amounts to about 30 seconds of audio. Finding natural stopping points in your text to chop is key to smooth long-form (LF) generation, which should be much easier with token-level traces in misaki[en] (hopefully other languages to follow).

Cleaner modeling code

The modeling code could benefit from a touch-up and as a side effect, become ONNX exportable and hopefully slightly faster.

Experimental features (TBD)

Today, voicepacks are essentially (510, 256)-shaped tensors, compiled as average styles per utterance length, with 510 possible lengths. Since most style vectors are computed on synthetic data, each style is essentially a "mean of means", which may explain why the voices are somewhat flat-sounding. It also implies that for any given utterance, currently the only features being used to choose how the voice sounds are (1) the user-selected voice name, like af and (2) the length of the utterance. Features like the punctuation texture .?! or the text sentiment are not yet being used. Potential solutions could be neural or even classical, e.g. using vector DBs. This, among other things, is still an area of research.

Community contributions welcome

Within a couple weeks of Kokoro's Christmas 2024 release, talented people already built great things. If you want to build something, go for it! Kokoro is permissive Apache-licensed software. If you also want to add or improve something here (or misaki), hopefully Kokoro can earn your commit, and feel free to open a PR if so.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kokoro-0.2.1.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kokoro-0.2.1-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file kokoro-0.2.1.tar.gz.

File metadata

  • Download URL: kokoro-0.2.1.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for kokoro-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f7b59c46525cd7f6486b66aee48f002a819c12b3a295e61b8e5a5930af898fa8
MD5 96a920f2edafe061e4ffc9b71017b74d
BLAKE2b-256 0461138af991de4ddfde457e865a1b62efe0d49206436d995a83836c6405b388

See more details on using hashes here.

File details

Details for the file kokoro-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: kokoro-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for kokoro-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c2aab9388ad64f371229f67e933c1e911610f1958974eb154fcf72972c5441ec
MD5 b40a352a2e1e84a2b17092c14738a6b0
BLAKE2b-256 3ad5551756e94775effd36976d75ea642f3287231a1df954fec5106f0c9357f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page