Advanced on-device Vietnamese TTS with instant voice cloning
Project description
🦜 VieNeu-TTS
VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.
[!TIP] Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.
This project features two core architectures trained on the VieNeu-TTS-1000h dataset:
- VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
- VieNeu-TTS-0.3B: A specialized model trained from scratch using the VieNeu-TTS-1000h dataset, delivering 2x faster inference and ultra-low latency.
These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:
- Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
- Code-switching support: Seamless transitions between Vietnamese and English
- Better voice cloning: Higher fidelity and speaker consistency
- Real-time synthesis: 24 kHz waveform generation on CPU or GPU
- Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec
VieNeu-TTS delivers production-ready speech synthesis fully offline.
Author: Phạm Nguyễn Ngọc Bảo
📌 Table of Contents
- 🦜 Installation & Web UI
- 📦 Using the Python SDK
- 🐳 Docker & Remote Server
- 🎯 Custom Models
- 🛠️ Fine-tuning Guide
- 🔬 Model Overview
- 🐋 Deployment with Docker (Compose)
- 🤝 Support & Contact
🦜 1. Installation & Web UI
The fastest way to experience VieNeu-TTS is through the Web interface (Gradio).
System Requirements
- Python: 3.12
- eSpeak NG: Required for phonemization.
- Windows: Download the
.msifrom eSpeak NG Releases. - macOS:
brew install espeak - Ubuntu/Debian:
sudo apt install espeak-ng
- Windows: Download the
- NVIDIA GPU (Optional): For maximum speed via LMDeploy or GGUF GPU acceleration.
- Requires NVIDIA Driver >= 570.65 (CUDA 12.8+) or higher.
- For LMDeploy, it is recommended to have the NVIDIA GPU Computing Toolkit installed.
Installation Steps
-
Clone the Repo:
git clone https://github.com/pnnbao97/VieNeu-TTS.git cd VieNeu-TTS
-
Environment Setup with
uv(Recommended):
-
Step A: Install uv (if you haven't)
# Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Linux/macOS: curl -LsSf https://astral.sh/uv/install.sh | sh
-
Step B: Install dependencies
-
Option 1: Default (with GPU support)
uv sync(Optional: See GGUF GPU Acceleration if you want to use GGUF models on GPU)
-
Option 2: CPU-ONLY (Lightweight version)
uv sync --no-default-groups
- Start the Web UI:
uv run gradio_app.py
Access the UI athttp://127.0.0.1:7860.
🚀 GGUF GPU Acceleration (Optional)
If you want to use GGUF models with GPU acceleration (llama-cpp-python), follow these steps:
Windows Users
Run the following command after uv sync:
uv pip install "https://github.com/pnnbao97/VieNeu-TTS/releases/download/llama-cpp-python-cu124/llama_cpp_python-0.3.16-cp312-cp312-win_amd64.whl"
Note: Requires NVIDIA Driver version 551.61 (CUDA 12.4) or newer.
Linux / macOS Users
Please refer to the official llama-cpp-python documentation for installation instructions specific to your hardware (CUDA, Metal, ROCm).
📦 2. Using the Python SDK (vieneu)
Integrate VieNeu-TTS into your own software projects.
Quick Install
# Windows (Avoid llama-cpp build errors)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/
# Linux / MacOS
pip install vieneu
Quick Start (main.py)
from vieneu import Vieneu
# 1. Initialize (Default: Local CPU Optimized)
tts = Vieneu()
# Or use Remote Mode for max speed (see Docker & Remote Server section below):
# tts = Vieneu(mode="remote", api_base="http://your-server-ip:23333/v1", model_name="pnnbao-ump/VieNeu-TTS")
# 2. Synthesis
text = "Xin chào, tôi là VieNeu. Tôi có thể giúp bạn đọc sách, làm chatbot thời gian thực, hoặc thậm chí clone giọng nói của bạn."
audio = tts.infer(text=text)
# 3. Save
tts.save(audio, "output.wav")
For a full guide on cloning and custom voices, see main.py and main_remote.py.
🐳 3. Docker & Remote Server
Deploy VieNeu-TTS as a high-performance API Server (powered by LMDeploy) with a single command.
1. Run with Docker (Recommended)
Requirement: NVIDIA Container Toolkit is required for GPU support.
Start the Server with a Public Tunnel (No port forwarding needed):
docker run --gpus all -p 23333:23333 pnnbao/vieneu-tts:serve
- Default: The server loads the
VieNeu-TTSmodel for maximum quality. - Tunneling: The Docker image includes a built-in
boretunnel. Check the container logs to find your public address (e.g.,bore.pub:31631).
2. Using the SDK (Remote Mode)
Once the server is running, you can connect from anywhere (Colab, Web Apps, etc.) without loading heavy models locally:
from vieneu import Vieneu
# Connect to the server
tts = Vieneu(
mode='remote',
api_base='http://your-server-ip:23333/v1', # Or the bore tunnel URL
model_name="pnnbao-ump/VieNeu-TTS"
)
# Ultra-fast inference (low latency)
audio = tts.infer(text="Xin chào, tôi là VieNeu. Tôi có thể giúp bạn đọc sách, làm chatbot thời gian thực, hoặc thậm chí clone giọng nói của bạn.")
tts.save(audio, "output.wav")
3. Advanced Configuration
Customize the server to run specific versions or your own fine-tuned models.
Run the 0.3B Model (Faster):
docker run --gpus all pnnbao/vieneu-tts:serve --model pnnbao-ump/VieNeu-TTS-0.3B
Serve a Local Fine-tuned Model: If you have merged a LoRA adapter, mount your output directory to the container:
# Linux / macOS
docker run --gpus all \
-v $(pwd)/finetune/output:/workspace/models \
pnnbao/vieneu-tts:serve \
--model /workspace/models/merged_model
For full implementation details, see: main_remote.py
🎯 4. Custom Models (LoRA, GGUF, Finetune)
VieNeu-TTS allows you to load custom models directly from HuggingFace or local paths via the Web UI.
-
LoRA Support: Automatically merges LoRA into the base model and accelerates with LMDeploy.
-
GGUF Support: Runs smoothly on CPU using the llama.cpp backend.
-
Private Repos: Supports entering an HF Token to access private models.
👉 See the detailed guide at: docs/CUSTOM_MODEL_USAGE.md
🛠️ 5. Fine-tuning Guide
Train VieNeu-TTS on your own voice or custom datasets.
- Simple Workflow: Use the
train.pyscript with optimized LoRA configurations. - Documentation: Follow the step-by-step guide in finetune/README.md.
- Notebook: Experience it directly on Google Colab via
finetune/finetune_VieNeu-TTS.ipynb.
🔬 6. Model Overview (Backbones)
| Model | Format | Device | Quality | Speed |
|---|---|---|---|---|
| VieNeu-TTS | PyTorch | GPU/CPU | ⭐⭐⭐⭐⭐ | Very Fast with lmdeploy |
| VieNeu-TTS-0.3B | PyTorch | GPU/CPU | ⭐⭐⭐⭐ | Ultra Fast (2x) |
| VieNeu-TTS-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐ | Fast |
| VieNeu-TTS-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐ | Very Fast |
| VieNeu-TTS-0.3B-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐ | Ultra Fast (1.5x) |
| VieNeu-TTS-0.3B-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐ | Extreme Speed (2x) |
🔬 Model Details
- Training Data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for all versions).
- Audio Codec: NeuCodec (Torch implementation; ONNX & quantized variants supported).
- Context Window: 2,048 tokens shared by prompt text and speech tokens.
- Output Watermark: Enabled by default.
🐋 7. Deployment with Docker (Compose)
Deploy quickly without manual environment setup.
# Run with CPU
docker compose --profile cpu up
# Run with GPU (Requires NVIDIA Container Toolkit)
docker compose --profile gpu up
Check docs/Deploy.md for more details.
📚 References
- Dataset: VieNeu-TTS-1000h (Hugging Face)
- Model 0.5B: pnnbao-ump/VieNeu-TTS
- Model 0.3B: pnnbao-ump/VieNeu-TTS-0.3B
- LoRA Guide: docs/CUSTOM_MODEL_USAGE.md
🤝 8. Support & Contact
- Hugging Face: pnnbao-ump
- Discord: Join our community
- Facebook: Pham Nguyen Ngoc Bao
- Licensing:
- VieNeu-TTS (0.5B): Apache 2.0 (Free to use).
- VieNeu-TTS-0.3B: CC BY-NC 4.0 (Non-commercial).
- ✅ Free: For students, researchers, and non-profit purposes.
- ⚠️ Commercial/Enterprise: Contact the author for licensing (Estimated: 5,000 USD/year - negotiable).
📑 Citation
@misc{vieneutts2026,
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
author = {Pham Nguyen Ngoc Bao},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}
🙏 Acknowledgements
This project builds upon the NeuTTS Air and NeuCodec architectures. Specifically, the VieNeu-TTS (0.5B) model is fine-tuned from NeuTTS Air, while the VieNeu-TTS-0.3B model is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.
Made with ❤️ for the Vietnamese TTS community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vieneu-1.1.5.tar.gz.
File metadata
- Download URL: vieneu-1.1.5.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a589d4502449a64e86944eab9a1d573f5fb7ee1be260e10773479a8642db05e
|
|
| MD5 |
fd31173eeb024a98bd28f428dbbc3e82
|
|
| BLAKE2b-256 |
9c36f72204f3dcfb16b31af8856b8978a773ea48f90c1ad190ccf9218bd3388b
|
File details
Details for the file vieneu-1.1.5-py3-none-any.whl.
File metadata
- Download URL: vieneu-1.1.5-py3-none-any.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae97bc875dca46cb9cdc0e21fe5c0a6dcac0900619cc66f9c6913f3987585484
|
|
| MD5 |
173ef26af6d17ad75d1d1b29ae00e5f2
|
|
| BLAKE2b-256 |
2591f385d89a7f3114d5f63727e993a4f45fda08fecef92bd415b940987269f2
|