FastAPI web UI for Qwen3-TTS: custom voices, voice design, voice cloning, and per-request model selection.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Qwen3 TTS Web App

A FastAPI + vanilla JS UI to run Qwen3-TTS locally: custom voices, voice design, voice cloning, and per-request model selection.

Documentation

Prerequisites

Python 3.10+ with a GPU-enabled PyTorch build (GPU strongly recommended).
Disk/bandwidth for model downloads (several GB on first load).
Optional: FlashAttention 2 if your GPU supports it (pip install -U flash-attn --no-build-isolation).

Setup

pip install -r requirements.txt

If your machine cannot download weights during runtime, pre-download a model (e.g. huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice) and point QWEN_TTS_MODEL to that path.

Run

uvicorn app.main:app --reload --port 8000

Open http://localhost:8000 for the UI. API endpoints live under /api/*.

Docker

Build + run (CPU)

docker build -t qwen-tts .
docker run --rm -p 8000:8000 qwen-tts

Build + run (GPU)

Requires NVIDIA Container Toolkit and a CUDA-capable host.

docker build -t qwen-tts .
docker run --rm --gpus all -e QWEN_TTS_DEVICE=cuda:0 -p 8000:8000 qwen-tts

Docker Compose

docker compose up --build

Compose defaults to GPU (QWEN_TTS_DEVICE=cuda:0). For CPU-only, set QWEN_TTS_DEVICE=cpu in docker-compose.yml.

Configuration (env vars)

QWEN_TTS_MODEL — default model id or local path (default: Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice).
QWEN_TTS_DEVICE — device map (default: cuda:0 if available, else cpu).
QWEN_TTS_USE_FLASH — set to 1 to try FlashAttention 2.
QWEN_TTS_CUSTOM_MODEL — override default for Custom Voice mode (else uses QWEN_TTS_MODEL).
QWEN_TTS_VD_MODEL — override default for Voice Design mode (default: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign).
QWEN_TTS_CLONE_MODEL — override default for Voice Clone mode (default: Qwen/Qwen3-TTS-12Hz-1.7B-Base).
QWEN_TTS_VIDEO_FONT — full path to a font file for video transcript rendering (useful for CJK/foreign text).

Requests can override model_id and device per call, but the UI auto-selects the recommended models per mode from the upstream README.

Model quick reference (from upstream README)

Custom Voice: Qwen/Qwen3-TTS-12Hz-{0.6B,1.7B}-CustomVoice (speaker list included).
Voice Design: Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign (describe persona; no speaker list).
Voice Clone: Qwen/Qwen3-TTS-12Hz-{0.6B,1.7B}-Base (provide ref audio + transcript).
Tokenizer (encode/decode only): Qwen/Qwen3-TTS-Tokenizer-12Hz.

Features

Custom Voice: pick a provided speaker, language, and optional style prompt.
Voice Design: describe a persona and language; the model invents the voice.
Voice Clone: supply a reference audio (URL/path/base64) plus transcript to clone a voice.
Model selection: choose any released model id or local directory per request.
UI: shows available speakers/languages, plays inline, and offers WAV download.
Recording/Upload for cloning: record in-browser or upload; the UI converts to WAV before sending.
Saved voices: build a reusable voice profile (clone prompt) once and reuse it without re-uploading audio.
MP3 download: generation stays WAV; pick MP3 in the UI to convert the generated clip on demand (requires pydub + ffmpeg available).
Video export: render a vertical/square/landscape MP4 with waveform/spectrum visuals and transcript (requires ffmpeg with drawtext).

API Examples

Custom Voice

curl -X POST http://localhost:8000/api/tts \
  -H "Content-Type: application/json" \
  -o custom.wav \
  -d '{
    "mode": "custom_voice",
    "model_id": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    "language": "English",
    "speaker": "Ryan",
    "instruct": "Energetic podcast intro with a smile.",
    "text": "Welcome back to our weekend build session. Grab your coffee and let us ship!"
  }'

Voice Design

curl -X POST http://localhost:8000/api/tts \
  -H "Content-Type: application/json" \
  -o design.wav \
  -d '{
    "mode": "voice_design",
    "model_id": "Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    "language": "English",
    "instruct": "Late-night radio host, warm baritone, unhurried pace with soft consonants.",
    "text": "You are tuned to 88.5 FM. Outside the city is sleeping, but we are still here with you."
  }'

Voice Clone

curl -X POST http://localhost:8000/api/tts \
  -H "Content-Type: application/json" \
  -o clone.wav \
  -d '{
    "mode": "voice_clone",
    "model_id": "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    "language": "English",
    "ref_audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav",
    "ref_text": "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you.",
    "text": "This is a cloned voice reading a new paragraph. We can keep the tone calm and measured."
  }'

For quick experiments without a transcript, set "x_vector_only_mode": true and omit ref_text (quality may drop).

Save a voice profile (reuse clone prompt)

curl -X POST http://localhost:8000/api/voice_profiles \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my_radio_host",
    "model_id": "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    "ref_audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav",
    "ref_text": "Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it! And thanks to you."
  }'

Then synthesize with that cached prompt:

curl -X POST http://localhost:8000/api/tts \
  -H "Content-Type: application/json" \
  -o clone_with_profile.wav \
  -d '{
    "mode": "voice_clone",
    "voice_profile": "my_radio_host",
    "text": "We can keep reusing this voice without re-uploading audio.",
    "language": "English"
  }'

Voice Design → Clone Reuse

Use the Voice Design model to synthesize a short clip with the desired persona.
Feed that clip and its text back as ref_audio/ref_text with mode: "voice_clone" using the Base model.
This keeps a consistent designed voice for longer scripts.

Frontend

The UI exposes the same options: pick mode, enter model id/path, language, speaker (custom voice), style (voice design), or ref audio/transcript (voice clone). It streams back a WAV, plays inline, and offers a download link.

Notes

GPU + bfloat16/float16 greatly reduces latency and memory; CPU runs will be slow.
Reference audio can be a public URL, local path, or base64 data URI. Keep it clean and ~3–10s for best cloning.
The page pulls a Google Font; remove the <link> in frontend/index.html if you need offline-only assets.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

h1ddenpr0cess20

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.2

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen_tts_webui-1.0.2.tar.gz (62.5 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwen_tts_webui-1.0.2-py3-none-any.whl (40.2 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file qwen_tts_webui-1.0.2.tar.gz.

File metadata

Download URL: qwen_tts_webui-1.0.2.tar.gz
Upload date: Mar 16, 2026
Size: 62.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qwen_tts_webui-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7f9f808fe5f302edb453e5d89ba08836cda3956277b71da0eb7be99c6a641e36`
MD5	`2401a3d1b0067cb6f7aeb9189c23bdfa`
BLAKE2b-256	`e00c9a4d37dc6e75b58c28a7644538d3d44e5c48bc027a15712bd77ec52b7d04`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_tts_webui-1.0.2.tar.gz:

Publisher: pypi-publish.yml on h1ddenpr0cess20/qwen-tts-webui

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen_tts_webui-1.0.2.tar.gz
- Subject digest: 7f9f808fe5f302edb453e5d89ba08836cda3956277b71da0eb7be99c6a641e36
- Sigstore transparency entry: 1109248506
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: h1ddenpr0cess20/qwen-tts-webui@7e5e1f1dc8b8827007e5bcb877643ff43f731c0c
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/h1ddenpr0cess20
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@7e5e1f1dc8b8827007e5bcb877643ff43f731c0c
- Trigger Event: release

File details

Details for the file qwen_tts_webui-1.0.2-py3-none-any.whl.

File metadata

Download URL: qwen_tts_webui-1.0.2-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 40.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qwen_tts_webui-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9631611e64df471833fa001bdad9d7ff005ba184d18d489917358309d4dd86d9`
MD5	`85ec124bbcbddb7cae049567a44e6168`
BLAKE2b-256	`2025a7f02847d28dbb88170e4321c63b06a0e545391d07d1d26f0b68594820fb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qwen_tts_webui-1.0.2-py3-none-any.whl:

Publisher: pypi-publish.yml on h1ddenpr0cess20/qwen-tts-webui

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qwen_tts_webui-1.0.2-py3-none-any.whl
- Subject digest: 9631611e64df471833fa001bdad9d7ff005ba184d18d489917358309d4dd86d9
- Sigstore transparency entry: 1109248514
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: h1ddenpr0cess20/qwen-tts-webui@7e5e1f1dc8b8827007e5bcb877643ff43f731c0c
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/h1ddenpr0cess20
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@7e5e1f1dc8b8827007e5bcb877643ff43f731c0c
- Trigger Event: release

qwen-tts-webui 1.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Qwen3 TTS Web App

Documentation

Prerequisites

Setup

Run

Docker

Build + run (CPU)

Build + run (GPU)

Docker Compose

Configuration (env vars)

Model quick reference (from upstream README)

Features

API Examples

Custom Voice

Voice Design

Voice Clone

Save a voice profile (reuse clone prompt)

Voice Design → Clone Reuse

Frontend

Notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance