Advanced on-device Vietnamese TTS with instant voice cloning
Project description
VieNeu-TTS
VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.
[!TIP] Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.
This project features two core architectures trained on the VieNeu-TTS-1000h dataset:
- VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
- VieNeu-TTS-0.3B: A specialized model trained from scratch, delivering 2x faster inference and ultra-low latency.
These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:
- Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
- Code-switching support: Seamless transitions between Vietnamese and English
- Better voice cloning: Higher fidelity and speaker consistency
- Real-time synthesis: 24 kHz waveform generation on CPU or GPU
- Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec
VieNeu-TTS delivers production-ready speech synthesis fully offline.
Author: Phạm Nguyễn Ngọc Bảo
🔬 Model Overview
- Backbone:
- VieNeu-TTS (0.5B): Qwen-0.5B fine-tuned from NeuTTS Air.
- VieNeu-TTS-0.3B: Custom 0.3B model trained from scratch, optimized for extreme speed (2x faster).
- Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
- Context window: 2,048 tokens shared by prompt text and speech tokens
- Output watermark: Enabled by default
- Training data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for both versions).
Model Variants
| Model | Format | Device | Quality | Speed |
|---|---|---|---|---|
| VieNeu-TTS | PyTorch | GPU/CPU | ⭐⭐⭐⭐⭐ | Very Fast with lmdeploy |
| VieNeu-TTS-0.3B | PyTorch | GPU/CPU | ⭐⭐⭐⭐ | Ultra Fast (2x) |
| VieNeu-TTS-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐ | Fast |
| VieNeu-TTS-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐ | Very Fast |
| VieNeu-TTS-0.3B-q8-gguf | GGUF Q8 | CPU/GPU | ⭐⭐⭐⭐ | Ultra Fast (1.5x) |
| VieNeu-TTS-0.3B-q4-gguf | GGUF Q4 | CPU/GPU | ⭐⭐⭐ | Extreme Speed (2x) |
Recommendations:
- GPU users: Use
VieNeu-TTS(PyTorch) for best quality - CPU users: Use
VieNeu-TTS-0.3B-q4-gguffor fastest inference orVieNeu-TTS-0.3B-q8-gguffor best CPU quality. - Streaming: Only GGUF models support streaming inference (Requires
llama-cpp-python >= 0.3.16)
✅ Todo & Status
- Publish safetensor artifacts
- Release GGUF Q4 / Q8 models
- Release datasets (1000h and 140h)
- Enable streaming on GPU
- Provide Dockerized setup
- Release fine-tuning code (LoRA)
- LoRA Adapter integration in Gradio
🌟 New Feature: LoRA Adapters
VieNeu-TTS now officially supports LoRA (Low-Rank Adaptation). This allows you to:
- Use custom fine-tuned voices from Hugging Face.
- Achieve much higher quality and similarity than zero-shot voice cloning.
- Switch between different adapters seamlessly in the Gradio UI.
For more details, see docs/LORA_USAGE.md.
🛠️ Fine-tuning
You can now train VieNeu-TTS on your own voice dataset!
- Simple Workflow: Follow the step-by-step guide in finetune/README.md.
- Notebook Support: Use
finetune/finetune_VieNeu-TTS.ipynbfor an interactive experience.
🏁 Getting Started
1. Clone the repository
git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
2. Install eSpeak NG (Required)
Phonemizer requires eSpeak NG to function.
- Windows: Download installer from eSpeak NG Releases (Recommended:
.msi). - macOS:
brew install espeak - Ubuntu/Debian:
sudo apt install espeak-ng - Arch Linux:
paru -S aur/espeak-ng
3. Environment Setup (Choose ONE method)
Method 1: Standard with uv (Recommended)
This is the fastest and most reliable way to manage dependencies.
A. Install uv (If you haven't already):
- Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" - Linux/macOS:
curl -LsSf https://astral.sh/uv/install.sh | sh
B. Install dependencies:
[!TIP] For NVIDIA GPU Users: To use LMDeploy (Turbo mode) and achieve maximum performance, ensure you have updated drivers and CUDA Toolkit 12.8 or newer installed.
# Default setup (Includes GPU support for Local Development)
uv sync
# If you specifically want to avoid GPU dependencies (CPU-only)
uv sync --no-default-groups
Note: GPU support (LMDeploy) is currently optimized for Linux and Windows. macOS users should use the standard uv sync.
📦 Using as a Python SDK (via pip)
If you want to integrate VieNeu-TTS into your own project:
1. Windows (Hassle-free setup)
We provide pre-built CPU wheels for llama-cpp-python (version 0.3.16) for Python 3.10 to 3.14 to avoid compilation errors.
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/
2. Linux / macOS / Others
pip install vieneu
3. GPU Support (Remote Server)
For high-performance GPU inference without local complexity, we recommend the Remote mode. You can run an lmdeploy server elsewhere and connect via:
from vieneu_tts import Vieneu
# Connect to a remote LMDeploy server
tts = Vieneu(mode="remote", api_base="http://your-server-ip:23333/v1")
Run the Application (Gradio Web UI):
uv run gradio_app.py
Then access the Web UI at http://127.0.0.1:7860.
Method 2: Automatic with Makefile (Alternative)
Best if you have make installed (standard on Linux/macOS, or via Git Bash on Windows). It handles configuration swaps automatically.
- Setup:
make setup - Run Demo:
make demo
Then access the Web UI at http://127.0.0.1:7860.
🐋 Docker Deployment
For a quick start or production deployment without manually installing dependencies, use Docker.
Quick Start
Copy .env.example to .env
cp .env.example .env
Build and start container
# Run with CPU
docker compose --profile cpu up
# Run with GPU (requires NVIDIA Container Toolkit)
docker compose --profile gpu up
Access the Web UI at http://localhost:7860.
For detailed deployment instructions, including production setup, see docs/Deploy.md.
📦 Project Structure
VieNeu-TTS/
├── vieneu_tts/ # Core engine implementation (VieNeuTTS & FastVieNeuTTS)
├── finetune/ # LoRA training pipeline
│ ├── configs/ # Training & LoRA configurations
│ ├── data_scripts/ # Data filtering & VQ encoding tools
│ ├── dataset/ # Training data storage
│ ├── output/ # Saved checkpoints & LoRA adapters
│ └── train.py # Main training script
├── utils/ # Text normalization and phonemization logic
├── sample/ # Built-in reference voices (audio + transcript + codes)
├── docs/ # Detailed documentation for LoRA, Deployment, and Docker
├── examples/ # Usage examples and testing audio references
├── gradio_app.py # Modern Web UI with LoRA & Streaming support
├── config.yaml # Model, Codec, and Voice registry
├── pyproject.toml # Unified dependency management (UV/PIP)
├── Makefile # Shortcuts for setup and execution
└── docker-compose.yml # Docker orchestration for CPU/GPU modes
📚 References
- GitHub Repository
- Hugging Face Model (0.5B)
- Hugging Face Model (0.3B)
- LoRA Usage Guide
- Fine-tuning Guide
- VieNeu-TTS-1000h dataset
📄 License
- VieNeu-TTS (0.5B): Original terms (Apache 2.0).
- VieNeu-TTS-0.3B: Released under CC BY-NC 4.0 (Non-Commercial).
- This version is currently experimental.
- Commercial use is prohibited without authorization. Please contact the author for commercial licensing.
📑 Citation
@misc{vieneutts2026,
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
author = {Pham Nguyen Ngoc Bao},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}
🤝 Contributing
Contributions are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m "Add amazing feature" - Push the branch:
git push origin feature/amazing-feature - Open a pull request
📞 Support
- GitHub Issues: github.com/pnnbao97/VieNeu-TTS/issues
- Hugging Face: huggingface.co/pnnbao-ump
- Discord: Join with us
- Facebook: Phạm Nguyễn Ngọc Bảo
🙏 Acknowledgements
This project builds upon NeuTTS Air for the original 0.5B model. The 0.3B version is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.
Made with ❤️ for the Vietnamese TTS community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vieneu-1.1.3.tar.gz.
File metadata
- Download URL: vieneu-1.1.3.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
215d356d5afa237fb1cc4eff1b773613a9d7aaefbcbd96ee5b9724daab355e1c
|
|
| MD5 |
317d004469147379ff74720748302c27
|
|
| BLAKE2b-256 |
486520d1c2e2e7664ed96288b71259d6b0979db3b8dd362927749cd5ef2b8b06
|
File details
Details for the file vieneu-1.1.3-py3-none-any.whl.
File metadata
- Download URL: vieneu-1.1.3-py3-none-any.whl
- Upload date:
- Size: 4.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
428e589344ae42bdb928003205d9fa5017fe37794cc03e0720046b6e01e950c5
|
|
| MD5 |
bac3c3cd89fb16d72b938d5d85c7eab6
|
|
| BLAKE2b-256 |
9dd84ad36c5b6fd5e5c8c5b0bce52698927520101667baf2c52af5662856c0af
|