Text to speech using the Zonos model
Project description
Sinapsis Zonos
Templates for advanced speech synthesis using Zonos
🐍 Installation • 🚀 Features • 📚 Usage example • 🌐 Webapp • 📙 Documentation • 🔍 License
This Sinapsis Zonos package provides a single template for integrating, configuring, and running text-to-speech (TTS) and voice cloning functionalities powered by Zonos. It supports multilingual speech, emotional modulation, and real-time audio generation.
🐍 Installation
[!IMPORTANT] Sinapsis project requires Python 3.10 or higher.
Install using your preferred package manager. We strongly recommend using uv. To install uv, refer to the official documentation.
Install with uv:
uv pip install sinapsis-zonos --extra-index-url https://pypi.sinapsis.tech
Or with raw pip:
pip install sinapsis-zonos --extra-index-url https://pypi.sinapsis.tech
[!IMPORTANT] Templates in each package may require additional dependencies. For development, we recommend installing the package with all the optional dependencies:
With uv:
uv pip install sinapsis-zonos[all] --extra-index-url https://pypi.sinapsis.tech
Or with raw pip:
pip install sinapsis-zonos[all] --extra-index-url https://pypi.sinapsis.tech
[!NOTE] Zonos depends on the eSpeak library phonemization. The installation depends on your OS. For Linux:
apt install -y espeak-ng
🚀 Features
Templates Supported
-
ZonosTTS: Template for converting text to speech or performing voice cloning based on the presence of an audio sample.
Attributes
cfg_scale(Optional): Controls randomness and creativity in speech generation (default:2.0, range: 1.0–5.0). Higher values introduce more variation in speech output.denoised_speaker(Optional): If True, applies denoising to the speaker embedding to reduce background noise (default:False).dnsmos(Optional): Denoising strength for hybrid models (default:4.0, range: 1.0–5.0).emotions(Optional): Emotion configuration to fine-tune the emotional tone of the generated speech (default:{}). Accepts an Emotions object with weights for various emotions.fmax(Optional): Maximum frequency cutoff in Hz for audio generation (default:22050, range: 0–24000).language(Optional): Language code used for synthesis (default:en-us)model(Optional): The Zonos model identifier to use (default:Zyphra/Zonos-v0.1-transformer). Options:Zyphra/Zonos-v0.1-transformerandZyphra/Zonos-v0.1-hybrid.output_folder(Optional): The folder where generated audio files will be saved (default:SINAPSIS_CACHE_DIR/elevenlabs/ audios).pitch_std(Optional): Standard deviation for pitch variation, which influences pitch naturalness (default:20.0, range: 0–300).prefix_audio(Optional): Path to an audio file used for prefix conditioning (e.g., whispering or prosody control) (default:None).randomized_seed(Optional): If True, a random seed is used for each generation (default:True).sampling_params(Optional): Controls sampling behavior for speech synthesis. Accepts a SamplingParams object with fields liketop_p,top_k,min_p,linear,conf, andquad.seed(Optional): Random seed used for deterministic generation. If randomized_seed is False, this value ensures repeatable output (default:420).speaker_audio(Optional): Path to a reference audio file used to extract speaker characteristics for voice cloning (default:None).speaking_rate(Optional): Speaking rate in syllables per second (default:15.0, range: 5–30).unconditional_keys(Optional): A set of keys (e.g., {vqscore_8,dnsmos_ovrl}) that disable speaker conditioning when generating speech.vq_score(Optional): VQ score threshold used by hybrid models to determine decoding style (default:0.7, range: 0.5–0.8).
[!TIP] Use CLI command
sinapsis info --example-template-config TEMPLATE_NAMEto produce an example Agent config for the Template specified in TEMPLATE_NAME.
For example, for ZonosTTS use sinapsis info --example-template-config ZonosTTS to produce an example config like:
Config
agent:
name: my_test_agent
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: ZonosTTS
class_name: ZonosTTS
template_input: InputTemplate
attributes:
cfg_scale: 2.0
denoised_speaker: false
dnsmos: 4.0
emotions:
happiness: 0
sadness: 0
disgust: 0
fear: 0
surprise: 0
anger: 0
other: 0
neutral: 0
fmax: 22050.0
language: en-us
model: Zyphra/Zonos-v0.1-transformer
output_folder: ~/.cache/sinapsis/zonos/audios
pitch_std: 20.0
prefix_audio: null
randomized_seed: true
sampling_params:
min_p: 0.0
top_k: 0
top_p: 0.0
linear: 0.0
conf: 0.0
quad: 0.0
seed: 420
speaker_audio: null
speaking_rate: 15.0
unconditional_keys: !!set
dnsmos_ovrl: null
vqscore_8: null
vq_score: 0.7
📚 Usage example
This example shows how to use the ZonosTTS template to convert text into speech. The generated audio is based on the input text and is saved locally as a file.
Config
agent:
name: text_to_speech
description: text to speech agent using Zonos
templates:
- template_name: InputTemplate
class_name: InputTemplate
attributes: {}
- template_name: TextInput
class_name: TextInput
template_input: InputTemplate
attributes:
text: This is a test of Sinapsis Zonos text-to-speech template.
- template_name: ZonosTTS
class_name: ZonosTTS
template_input: TextInput
attributes:
model: Zyphra/Zonos-v0.1-transformer
language: en-us
emotions:
happiness: 0.3077
sadness: 0.0256
disgust: 0.0256
fear: 0.0256
surprise: 0.0256
anger: 0.0256
other: 0.2564
neutral: 0.3077
fmax: 24000
pitch_std: 45.0
speaking_rate: 15.0
cfg_scale: 2.0
sampling_params:
linear: 0.5
conf: 0.4
quad: 0
randomized_seed: True
denoised_speaker: False
unconditional_keys:
- dnsmos_ovrl
- vqscore_8
This configuration defines an agent and a sequence of templates for speech synthesis, using Zonos.
[!IMPORTANT] The TextInput template correspond to sinapsis-data-readers. If you want to use the example, please make sure you install the package.
To run the config, use the CLI:
sinapsis run name_of_config.yml
🌐 Webapp
The webapps included in this project showcase the modularity of the templates, in this case for speech generation tasks.[!IMPORTANT] To run the app you first need to clone this repository:
git clone git@github.com:Sinapsis-ai/sinapsis-speech.git
cd sinapsis-speech
[!NOTE] If you'd like to enable external app sharing in Gradio,
export GRADIO_SHARE_APP=True
🐳 Build with Docker
IMPORTANT: This Docker image depends on the sinapsis-nvidia:base image. For detailed instructions, please refer to the Sinapsis README.
- Build the Docker image:
docker compose -f docker/compose.yaml build
- Start the app container:
docker compose -f docker/compose_apps.yaml up -d sinapsis-zonos
- Check the logs
docker logs -f sinapsis-zonos
- The logs will display the URL to access the webapp, e.g.,::
Running on local URL: http://127.0.0.1:7860
NOTE: The url may be different, check the output of logs.
- To stop the app:
docker compose -f docker/compose_apps.yaml down
💻 UV
To run the webapp using the uv package manager, follow these steps:
- Sync the virtual environment:
uv sync --frozen
- Install the wheel:
uv pip install sinapsis-speech[all] --extra-index-url https://pypi.sinapsis.tech
- Run the webapp:
uv run webapps/generic_tts_apps/zonos_tts_app.py
- The terminal will display the URL to access the webapp (e.g.):
Running on local URL: http://127.0.0.1:7860
NOTE: The URL may vary; check the terminal output for the correct address.
📙 Documentation
Documentation is available on the sinapsis website
Tutorials for different projects within sinapsis are available at sinapsis tutorials page
🔍 License
This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the LICENSE file.
For commercial use, please refer to our official Sinapsis website for information on obtaining a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinapsis_zonos-0.1.4.tar.gz.
File metadata
- Download URL: sinapsis_zonos-0.1.4.tar.gz
- Upload date:
- Size: 25.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71a711a4e75ffb4b67480e556257131222063205c3d93a01c958eecb727eaa72
|
|
| MD5 |
cd9d51f94a17d045dbc4639fe994b5de
|
|
| BLAKE2b-256 |
de099049a4ef4a9c67d7b70a8f85f3269a9330bdf0688082dcac387ce2055393
|
File details
Details for the file sinapsis_zonos-0.1.4-py3-none-any.whl.
File metadata
- Download URL: sinapsis_zonos-0.1.4-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5f55ae3d574ec58b301c41810115c288f7e67f0efccc6c9accac785a69b7feb
|
|
| MD5 |
0e4009b86837db44ca9aad0e8432a0fc
|
|
| BLAKE2b-256 |
963e9107f633eb68d2ec6b9f4ea7cd6579da7779cd116e2a207a5c1c164bef89
|