Skip to main content

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

Project description

zho-tts

PyPI PyPI Hugging Face 🤗 pytorch MIT PyPI PyPI PyPI DOI

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

Installation

pip install zho-tts --user

Usage as web app

Visit 🤗 Hugging Face for a live demo.

Screenshot Hugging Face

You can also run it locally be executing zho-tts-web in CLI and opening your browser on http://127.0.0.1:7860.

Usage as CLI

zho-tts-cli synthesize "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"

The output can be listened here.

# Same example using IPA input
zho-tts-cli synthesize-ipa "ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ˘|SIL0|x|a˧˥˘|ŋ|u˥˩|SIL0|k|w|a˧˩˧|n|l˘|i˧˩˧|tɕː|y˧˥ˑ|SIL0|x|ɤ˧˥|SIL0|ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|l|w|ə˧˥|n|ʈʂʰ˘|w|a˧˥|n|SIL0|ts˘|ʊ˧˩˧|ŋ˘|kː|ʊ˥|ŋ|s|ɹ̩˥ˑ|SIL0|ts|w˘|ei̯˥˩|tɕ|i˥˩˘|n|SIL0|tɕ|ɥ|e˧˥|t|i˥˩|ŋ|SIL3|a˥|n|pʰ|ai̯˧˥|SIL0|i˥ˑ|p|ai̯˧˩˧|s|a˥˘|n|ʂ˘|ɻ̩˧˥|s|a˥|n|SIL0|s˘|ou̯˥|SIL0|kʰˑ|ɤ˥˩|lː|wˑ|ə˧˥ˑ|n|SIL0|i˧˥ː|ŋ|tɕ˘|j˘|e˥|SIL0|ʈʂː|a˧˩˧|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|k˘|a˥˩|n|ɕ|j˘|ɛ˥˩|n˘|SIL0|ʈʂʰˑ|w˘|ə˥˘|nː|y˥˩ˑ|nː|。"

The output can be listened here.

Usage as library

from pathlib import Path
from tempfile import gettempdir

from zho_tts import Synthesizer, Transcriber, normalize_audio, save_audio

text = "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"

transcriber = Transcriber()
synthesizer = Synthesizer()

text_ipa = transcriber.transcribe_to_ipa(text)
audio = synthesizer.synthesize(text_ipa)

tmp_dir = Path(gettempdir())
save_audio(audio, tmp_dir / "output.wav")

# Optional: normalize output
normalize_audio(tmp_dir / "output.wav", tmp_dir / "output_norm.wav")

Model info

The used TTS model is published here.

Phoneme set

  • Vowels: a ɛ e ə ɚ ɤ i o u ʊ y
  • Diphthongs: ai̯ au̯ ei̯ ou̯
  • Consonants: f j k kʰ l m n p pʰ ɹ̩¹ ɻ¹ ɻ̩¹ s t ts tsʰ tɕ tɕʰ tʰ w x ŋ ɕ ɥ ʂ ʈʂ ʈʂʰ
  • Breaks:
    • SIL0 (no break)
    • SIL1 (short break)
    • SIL2 (break)
    • SIL3 (long break)
  • special characters: 。 ?

Vowels and diphthongs contain one of these tones:

  • ˥ (first tone)
  • ˧˥ (second tone)
  • ˧˩˧ (third tone)
  • ˥˩ (fourth tone)
  • (none)

¹ These consonants contain also tones.

Vowels, diphthongs and consonants contain one of these duration markers:

  • ˘ -> very short, e.g., ou̯˘
  • nothing -> normal, e.g., ou̯
  • ˑ -> half long, e.g., ou̯ˑ
  • ː -> long, e.g., ou̯ː

Tones and duration markers can be combined, e.g., ə˧˥ː

Speakers

Objective Evaluation

Citation

If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see About => Cite this repository).

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zho_tts-0.0.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

zho_tts-0.0.2-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file zho_tts-0.0.2.tar.gz.

File metadata

  • Download URL: zho_tts-0.0.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for zho_tts-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5d307fc84d0d6634e6e4ce406a2f7a843cb27a8f397807d919c6295ee921b4ec
MD5 9f967b335ab54ebd74687ce6a48c153f
BLAKE2b-256 0cc31f84c3b66afca7bec93554961bfe069d6c82ae8b95b6e0e4cbcc9891052c

See more details on using hashes here.

File details

Details for the file zho_tts-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: zho_tts-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 56.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for zho_tts-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d144187f82bec5026b927c16dbb67c32759477083a480de9ce6a93906937e2a1
MD5 1fcf4e4c9c36b123414250c77b7f3124
BLAKE2b-256 b9c68b09ae86b884d73dba0d73b89e327b1364c5f5baf4794bb59f529e0782bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page