Skip to main content

G2P engine for TTS

Project description

misaki

Misaki is a G2P engine designed for Kokoro models.

Hosted demo: https://hf.co/spaces/hexgrad/Misaki-G2P

English Usage

You can run this in one cell on Google Colab:

!pip install -q "misaki[en]"

from misaki import en

g2p = en.G2P(trf=False, british=False, fallback=None) # no transformer, American English

text = '[Misaki](/misˈɑki/) is a G2P engine designed for [Kokoro](/kˈOkəɹO/) models.'

phonemes, tokens = g2p(text)

print(phonemes) # misˈɑki ɪz ə ʤˈitəpˈi ˈɛnʤən dəzˈInd fɔɹ kˈOkəɹO mˈɑdᵊlz.

To fallback to espeak:

# Installing espeak varies across platforms, this silent install works on Colab:
!apt-get -qq -y install espeak-ng > /dev/null 2>&1

!pip install -q "misaki[en]" phonemizer-fork

from misaki import en, espeak

fallback = espeak.EspeakFallback(british=False) # en-us

g2p = en.G2P(trf=False, british=False, fallback=fallback) # no transformer, American English

text = 'Now outofdictionary words are handled by espeak.'

phonemes, tokens = g2p(text)

print(phonemes) # nˈW Wɾɑfdˈɪkʃənˌɛɹi wˈɜɹdz ɑɹ hˈændəld bI ˈispik.

English

Japanese

The second gen Japanese tokenizer now uses pyopenjtalk with full unidic, enabling pitch accent marks and improved phrase merging. Deep gratitude to @sophiefy for invaluable recommendations and nuanced help with pitch accent.

The first gen Japanese tokenizer mainly relies on cutlet => fugashi => mecab => unidic-lite, with each being a wrapper around the next. Deep gratitute to @Respaired for helping me learn the ropes of Japanese tokenization before any Kokoro model had started training.

Korean

The Korean tokenizer is copied from 5Hyeons's g2pkc fork of Kyubyong's widely used g2pK library. Deep gratitute to @5Hyeons for kindly helping with Korean and extending the original code by @Kyubyong.

Chinese

The second gen Chinese tokenizer adapts better logic from paddlespeech's frontend. Jieba now cuts and tags, and pinyin-to-ipa is no longer used.

The first gen Chinese tokenizer uses jieba to cut, pypinyin, and pinyin-to-ipa.

Vietnamese

TODO

  • Data: Compress data (no need for indented json) and eliminate redundancy between gold and silver dictionaries.
  • Fallbacks: Train seq2seq fallback models on dictionaries using this notebook.
  • Homographs: Escalate hard words like axes bass bow lead tear wind using BERT contextual word embeddings (CWEs) and logistic regression (LR) models (nn.Linear followed by sigmoid) as described in this paper. Assuming trf=True, BERT CWEs can be accessed via doc._.trf_data, see en.py#L479. Per-word LR models can be trained on WikipediaHomographData, llama-hd-dataset, and LLM-generated data.
  • More languages: Add ko.py, ja.py, zh.py.
  • Per-language pip installs

Acknowledgements

misaki

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

misaki-0.9.4.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

misaki-0.9.4-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file misaki-0.9.4.tar.gz.

File metadata

  • Download URL: misaki-0.9.4.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for misaki-0.9.4.tar.gz
Algorithm Hash digest
SHA256 3960fa3e6de179a90ee8e628446a4a4f6b8c730b6e3410999cf396189f4d9c40
MD5 049a1f0d4a7458adf7e6ca7ea5d4d070
BLAKE2b-256 1ac7fb01370a76585b46595a01b52f18e65c8ba6d7a313a05e5d9fff0a8e1c69

See more details on using hashes here.

File details

Details for the file misaki-0.9.4-py3-none-any.whl.

File metadata

  • Download URL: misaki-0.9.4-py3-none-any.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for misaki-0.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 90e2eeb169786c014c429e5058d2ea6bcd02d651f2a24450ba6c9ffc0f8da15a
MD5 f6ead95868e18ee05b02ac3519c2552f
BLAKE2b-256 82ec0ee4110ddb54278b8f21c40a140370ae8f687036c4edf578316602697c56

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page