Skip to main content

Text to speech using the Zonos model

Project description



Sinapsis Zonos

Templates for advanced speech synthesis using Zonos

🐍 Installation 🚀 Features 📚 Usage example🌐 Webapp📙 Documentation🔍 License

This Sinapsis Zonos package provides a single template for integrating, configuring, and running text-to-speech (TTS) and voice cloning functionalities powered by Zonos. It supports multilingual speech, emotional modulation, and real-time audio generation.

🐍 Installation

[!IMPORTANT] Sinapsis project requires Python 3.10 or higher.

Install using your preferred package manager. We strongly recommend using uv. To install uv, refer to the official documentation.

Install with uv:

  uv pip install sinapsis-zonos --extra-index-url https://pypi.sinapsis.tech

Or with raw pip:

  pip install sinapsis-zonos --extra-index-url https://pypi.sinapsis.tech

[!IMPORTANT] Templates in each package may require additional dependencies. For development, we recommend installing the package with all the optional dependencies:

With uv:

  uv pip install sinapsis-zonos[all] --extra-index-url https://pypi.sinapsis.tech

Or with raw pip:

  pip install sinapsis-zonos[all] --extra-index-url https://pypi.sinapsis.tech

[!NOTE] Zonos depends on the eSpeak library phonemization. The installation depends on your OS. For Linux:

apt install -y espeak-ng

🚀 Features

Templates Supported

  • ZonosTTS: Template for converting text to speech or performing voice cloning based on the presence of an audio sample.​

    Attributes
    • cfg_scale(Optional): Controls randomness and creativity in speech generation (default: 2.0, range: 1.0–5.0). Higher values introduce more variation in speech output.
    • denoised_speaker(Optional): If True, applies denoising to the speaker embedding to reduce background noise (default: False).
    • dnsmos(Optional): Denoising strength for hybrid models (default: 4.0, range: 1.0–5.0).
    • emotions(Optional): Emotion configuration to fine-tune the emotional tone of the generated speech (default: {}). Accepts an Emotions object with weights for various emotions.
    • fmax(Optional): Maximum frequency cutoff in Hz for audio generation (default: 22050, range: 0–24000).
    • language(Optional): Language code used for synthesis (default: en-us)
    • model(Optional): The Zonos model identifier to use (default: Zyphra/Zonos-v0.1-transformer). Options: Zyphra/Zonos-v0.1-transformer and Zyphra/Zonos-v0.1-hybrid.
    • output_folder(Optional): The folder where generated audio files will be saved (default: SINAPSIS_CACHE_DIR/elevenlabs/ audios).
    • pitch_std(Optional): Standard deviation for pitch variation, which influences pitch naturalness (default: 20.0, range: 0–300).
    • prefix_audio(Optional): Path to an audio file used for prefix conditioning (e.g., whispering or prosody control) (default: None).
    • randomized_seed(Optional): If True, a random seed is used for each generation (default: True).
    • sampling_params(Optional): Controls sampling behavior for speech synthesis. Accepts a SamplingParams object with fields like top_p, top_k, min_p, linear, conf, and quad.
    • seed(Optional): Random seed used for deterministic generation. If randomized_seed is False, this value ensures repeatable output (default: 420).
    • speaker_audio(Optional): Path to a reference audio file used to extract speaker characteristics for voice cloning (default: None).
    • speaking_rate(Optional): Speaking rate in syllables per second (default: 15.0, range: 5–30).
    • unconditional_keys(Optional): A set of keys (e.g., {vqscore_8, dnsmos_ovrl}) that disable speaker conditioning when generating speech.
    • vq_score(Optional): VQ score threshold used by hybrid models to determine decoding style (default: 0.7, range: 0.5–0.8).

[!TIP] Use CLI command sinapsis info --example-template-config TEMPLATE_NAME to produce an example Agent config for the Template specified in TEMPLATE_NAME.

For example, for ZonosTTS use sinapsis info --example-template-config ZonosTTS to produce an example config like:

Config
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: ZonosTTS
  class_name: ZonosTTS
  template_input: InputTemplate
  attributes:
    cfg_scale: 2.0
    denoised_speaker: false
    dnsmos: 4.0
    emotions:
      happiness: 0
      sadness: 0
      disgust: 0
      fear: 0
      surprise: 0
      anger: 0
      other: 0
      neutral: 0
    fmax: 22050.0
    language: en-us
    model: Zyphra/Zonos-v0.1-transformer
    output_folder: ~/.cache/sinapsis/zonos/audios
    pitch_std: 20.0
    prefix_audio: null
    randomized_seed: true
    sampling_params:
      min_p: 0.0
      top_k: 0
      top_p: 0.0
      linear: 0.0
      conf: 0.0
      quad: 0.0
    seed: 420
    speaker_audio: null
    speaking_rate: 15.0
    unconditional_keys: !!set
      dnsmos_ovrl: null
      vqscore_8: null
    vq_score: 0.7

📚 Usage example

This example shows how to use the ZonosTTS template to convert text into speech. The generated audio is based on the input text and is saved locally as a file.

Config
agent:
  name: text_to_speech
  description: text to speech agent using Zonos

templates:

- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}

- template_name: TextInput
  class_name: TextInput
  template_input: InputTemplate
  attributes:
    text:  This is a test of Sinapsis Zonos text-to-speech template.

- template_name: ZonosTTS
  class_name: ZonosTTS
  template_input: TextInput
  attributes:
    model: Zyphra/Zonos-v0.1-transformer
    language: en-us
    emotions:
      happiness: 0.3077
      sadness: 0.0256
      disgust: 0.0256
      fear: 0.0256
      surprise: 0.0256
      anger: 0.0256
      other: 0.2564
      neutral: 0.3077
    fmax: 24000
    pitch_std: 45.0
    speaking_rate: 15.0
    cfg_scale: 2.0
    sampling_params:
      linear: 0.5
      conf: 0.4
      quad: 0
    randomized_seed: True
    denoised_speaker: False
    unconditional_keys:
      - dnsmos_ovrl
      - vqscore_8

This configuration defines an agent and a sequence of templates for speech synthesis, using Zonos.

[!IMPORTANT] The TextInput template correspond to sinapsis-data-readers. If you want to use the example, please make sure you install the package.

To run the config, use the CLI:

sinapsis run name_of_config.yml

🌐 Webapp

The webapps included in this project showcase the modularity of the templates, in this case for speech generation tasks.

[!IMPORTANT] To run the app you first need to clone this repository:

git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
cd sinapsis-speech

[!NOTE] If you'd like to enable external app sharing in Gradio, export GRADIO_SHARE_APP=True

🐳 Build with Docker

IMPORTANT: This Docker image depends on the sinapsis-nvidia:base image. For detailed instructions, please refer to the Sinapsis README.

  1. Build the Docker image:
docker compose -f docker/compose.yaml build
  1. Start the app container:
docker compose -f docker/compose_apps.yaml up -d sinapsis-zonos
  1. Check the logs
docker logs -f sinapsis-zonos
  1. The logs will display the URL to access the webapp, e.g.,::
Running on local URL:  http://127.0.0.1:7860

NOTE: The url may be different, check the output of logs.

  1. To stop the app:
docker compose -f docker/compose_apps.yaml down
💻 UV

To run the webapp using the uv package manager, follow these steps:

  1. Sync the virtual environment:
uv sync --frozen
  1. Install the wheel:
uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
  1. Run the webapp:
uv run webapps/generic_tts_apps/zonos_tts_app.py
  1. The terminal will display the URL to access the webapp (e.g.):
Running on local URL:  http://127.0.0.1:7860

NOTE: The URL may vary; check the terminal output for the correct address.

📙 Documentation

Documentation is available on the sinapsis website

Tutorials for different projects within sinapsis are available at sinapsis tutorials page

🔍 License

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.

For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinapsis_zonos-0.1.9.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinapsis_zonos-0.1.9-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file sinapsis_zonos-0.1.9.tar.gz.

File metadata

  • Download URL: sinapsis_zonos-0.1.9.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.16

File hashes

Hashes for sinapsis_zonos-0.1.9.tar.gz
Algorithm Hash digest
SHA256 911522344fafd73eb9d3c858b8e161d8df1b161c2f6110669cbd14a0b0a071d8
MD5 092bf87d654bfac3a51e87650d7a374b
BLAKE2b-256 809671d446239dc3f1097a19b32456b89874e04e3f5ab9e226548a6c600be4c6

See more details on using hashes here.

File details

Details for the file sinapsis_zonos-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for sinapsis_zonos-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d39503c94eba5100dc9c5bb041d757512253ae53dc8e39750c1eb2dbce0b76c9
MD5 4748eabea505658187b8ce5a02587208
BLAKE2b-256 4310d7a1c599d057148e5904f89a00322a9e5e4a83fad1bf5d0a483d01de1eab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page