Skip to main content

TTS WebUI / Harmonica

Project description

Note: This is a test upload to see if tts webui can be packaged and uploaded to PyPI.

TTS WebUI / Harmonica

Videos

Watch the video Watch the video Watch the video

Examples

Screenshots

react_1 react_2 react_3
gradio_1 gradio_2 gradio_3

Installation

Using the Installer (Recommended)

Current base installation size is around 10.7 GB. Each model will require 2-8 GB of space in addition.

  • Download the latest version and extract it.
  • Run start_tts_webui.bat or start_tts_webui.sh to start the server. It will ask you to select the GPU/Chip you are using. Once everything has installed, it will start the Gradio server at http://localhost:7770 and the React UI at http://localhost:3000.
  • Output log will be available in the installer_scripts/output.log file.
  • Note: The start script sets up a conda environment and a python virtual environment. Thus you don't need to make a venv before that, and in fact, launching from another venv might break this script.

Manual installation

Prerequisites:

  • git
  • Python 3.10 or 3.11 (3.12 not supported yet)
  • PyTorch
  • ffmpeg (with vorbis support)
  • (Optional) NodeJS 22.9.0 for React UI
  • (Optional) PostgreSQL 16.4+ for database support
  1. Clone the repository:

    git clone https://github.com/rsxdalv/tts-webui.git
    cd tts-webui
    
  2. Install required packages:

    pip install -r requirements.txt
    
  3. Run the server:

    python server.py --no-react
    
  4. For React UI:

    cd react-ui
    npm install
    npm run build
    cd ..
    python server.py
    

For detailed manual installation instructions, please refer to the Manual Installation Guide.

Docker Setup

tts-webui can also be ran inside of a Docker container. Using CUDA inside of docker requires NVIDIA Container Toolkit. To get started, pull the image from GitHub Container Registry:

docker pull ghcr.io/rsxdalv/tts-webui:main

Once the image has been pulled it can be started with Docker Compose: The ports are 7770 (env:TTS_PORT) for the Gradio backend and 3000 (env:UI_PORT) for the React front end.

docker compose up -d

The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:

docker logs tts-webui

Building the image yourself

If you wish to build your own docker container, you can use the included Dockerfile:

docker build -t tts-webui .

Please note that the docker-compose needs to be edited to use the image you just built.

Changelog

September:

  • OpenAI API now supports Whisper transcriptions
  • Removed PyTorch Nightly option
  • Fix Google Colab installation (Python 3.12 not supported)
  • Add Kitten TTS Mini extension
  • Add PyRNNoise extension
  • Upgrade React UI's Chatterbox interface
  • Rename Kokoro TTS extension to OpenAI TTS API extension
  • Rename all extensions to tts_webui_extension.*
  • Switch to PyPI for multiple extensions
  • Add Intel PyTorch installation option
  • Add "Custom" Choice option to installer for self-managed PyTorch installations
  • Integrate with new pip index for extensions (https://tts-webui.github.io/extensions-index/)

August:

  • Fix model downloader when no token is used, thanks Nusantara.
  • Improve Chatterbox speed
  • Add VibeVoice (Early Access) extension
  • Add docker compose volumes to persist data #529, thanks FranckKe.
  • [react-ui] Prepend voices/chatterbox to voice file selection in ap test page #542, thanks rohan-sircar.

July:

  • Add new tutorials
  • Add more robust gradio launching
  • Simplify installation instructions
  • Improve chatterbox speed.

Past Changes

See the 2025 Changelog for a detailed list of changes in 2025.

See the 2024 Changelog for a detailed list of changes in 2024.

See the 2023 Changelog for a detailed list of changes in 2023.

Extensions

Extensions are available to install from the webui itself, or using React UI. They can also be installed using the extension manager. Internally, extensions are just python packages that are installed using pip. Multiple extensions can be installed at the same time, but there might be compatibility issues between them. After installing or updating an extension, you need to restart the app to load it.

Updates need to be done manually by using the mini-control panel:

mini-control-panel

Integrations

Silly Tavern

  1. Update OpenAI TTS API extension to latest version

  2. Start the API and test it with Python Requests

    (OpenAI client might not be installed thus the Test with Python OpenAI client might fail)

  3. Once you can see the audio generates successfully, go to Silly Tavern, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech silly-tavern-tts-api

  4. Test it out!

Text Generation WebUI (oobabooga/text-generation-webui)

  1. Install https://github.com/rsxdalv/text-to-tts-webui extension in text-generation-webui
  2. Start the API and test it with Python Requests
  3. Configure using the panel: oobaboooga-text-to-tts-webui

OpenWebUI

  1. Enable OpenAI API extension in TTS WebUI
  2. Start the API and test it with Python Requests
  3. Once you can see the audio generates successfully, go to OpenWebUI, and add a new TTS API Default provider endpoint: http://localhost:7778/v1/audio/speech
  4. Test it out! openwebui

OpenAI Compatible APIs

Using the instructions above, you can install an OpenAI compatible API, and use it with Silly Tavern or other OpenAI compatible clients.

Compatibility / Errors

Red messages in console

These messages:

---- requires ----, but you have ---- which is incompatible.

Are completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings/errors the projects will work together. It's not clear if this situation will ever be resolvable, but that is the hope.

Extra Voices for Bark, Prompt Samples

PromptEcho

Bark Speaker Directory

Bark Readme

README_Bark.md

Info about managing models, caches and system space for AI projects

https://github.com/rsxdalv/tts-webui/discussions/186#discussioncomment-7291274

Open Source Libraries

This project utilizes the following open source libraries:

Ethical and Responsible Use

This technology is intended for enablement and creativity, not for harm.

By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.

  • Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.
  • No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.
  • No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.
  • Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.
  • Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.

License

Codebase and Dependencies

The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.

That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.

Known non-permissive dependencies:

Library License Notes
encodec CC BY-NC 4.0 Newer versions are MIT, but need to be installed manually
diffq CC BY-NC 4.0 Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs
lameenc GPL License Future versions will make it LGPL, but need to be installed manually
unidecode GPL License Not mission critical, can be replaced with another library, issue: https://github.com/neonbjb/tortoise-tts/issues/494

Model Weights

Model weights have different licenses, please pay attention to the license of the model you are using.

Most notably:

  • Bark: MIT
  • Tortoise: Unknown (Apache-2.0 according to repo, but no license file in HuggingFace)
  • MusicGen: CC BY-NC 4.0
  • AudioGen: CC BY-NC 4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tts_webui-0.0.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tts_webui-0.0.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file tts_webui-0.0.0.tar.gz.

File metadata

  • Download URL: tts_webui-0.0.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for tts_webui-0.0.0.tar.gz
Algorithm Hash digest
SHA256 314e50d59473b337baf1c54c25d32ddbe8c2706cff4dbe7671bfbb9579f215f0
MD5 0f174fad212c986827397a80f9cc6f91
BLAKE2b-256 334e8f51da0efbb294ddeabc174fe9404319f4e07892c03b7cc192e128ee9ff9

See more details on using hashes here.

File details

Details for the file tts_webui-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: tts_webui-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for tts_webui-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dceb516cbef0d5927e4495239be4899e021a3cd8fef207e7583d30a50c9cb432
MD5 660802828f5e63177ff201b7c8200857
BLAKE2b-256 10ecd2d973727b700d030aecb655611e705b854a0d05b4d50805bbc2d9b7e06e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page