Advanced on-device Vietnamese TTS with instant voice cloning

These details have not been verified by PyPI

Project links

Project description

🦜 VieNeu-TTS

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

[!TIP] Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.

This project features two core architectures trained on the VieNeu-TTS-1000h dataset:

VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
VieNeu-TTS-0.3B: A specialized model trained from scratch using the VieNeu-TTS-1000h dataset, delivering 2x faster inference and ultra-low latency.

These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:

Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
Code-switching support: Seamless transitions between Vietnamese and English
Better voice cloning: Higher fidelity and speaker consistency
Real-time synthesis: 24 kHz waveform generation on CPU or GPU
Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo

🦜 1. Installation & Web UI

The fastest way to experience VieNeu-TTS is through the Web interface (Gradio).

System Requirements

eSpeak NG: Required for phonemization.
- Windows: Download the .msi from eSpeak NG Releases.
- macOS: brew install espeak
- Ubuntu/Debian: sudo apt install espeak-ng
NVIDIA GPU (Optional): For maximum speed via LMDeploy or GGUF GPU acceleration.
- Requires NVIDIA Driver >= 570.65 (CUDA 12.8+) or higher.
- For LMDeploy, it is recommended to have the NVIDIA GPU Computing Toolkit installed.

Installation Steps

Clone the Repo:

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS

Environment Setup with uv (Recommended):

Step A: Install uv (if you haven't)

# Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Linux/macOS:
curl -LsSf https://astral.sh/uv/install.sh | sh

Step B: Install dependencies

Option 1: GPU Support (Default)

uv sync

(Optional: See GGUF GPU Acceleration if you want to use GGUF models on GPU)

Option 2: CPU-ONLY (Lightweight, no CUDA)

# Linux/macOS:
cp pyproject.toml pyproject.toml.gpu
cp pyproject.toml.cpu pyproject.toml
uv sync

# Windows (PowerShell/CMD):
copy pyproject.toml pyproject.toml.gpu
copy pyproject.toml.cpu pyproject.toml
uv sync

Start the Web UI:
```
uv run gradio_app.py
```
Access the UI at http://127.0.0.1:7860.

⚡ Real-time Streaming (CPU Optimized)

VieNeu-TTS supports ultra-low latency streaming, allowing audio playback to start before the entire sentence is finished. This is specifically optimized for CPU-only devices using the GGUF backend.

Latency: <300ms for the first chunk on modern i3/i5 CPUs.
Efficiency: Uses Q4/Q8 quantization and ONNX-based lightweight codecs.
Usage: Perfect for real-time interactive AI assistants.

Start the dedicated CPU streaming demo:

uv run web_stream_gguf.py

Then open http://localhost:8001 in your browser.

🚀 GGUF GPU Acceleration (Optional)

If you want to use GGUF models with GPU acceleration (llama-cpp-python), follow these steps:

Windows Users

Run the following command after uv sync:

uv pip install "https://github.com/pnnbao97/VieNeu-TTS/releases/download/llama-cpp-python-cu124/llama_cpp_python-0.3.16-cp312-cp312-win_amd64.whl"

Note: Requires NVIDIA Driver version 551.61 (CUDA 12.4) or newer.

Linux / macOS Users

Please refer to the official llama-cpp-python documentation for installation instructions specific to your hardware (CUDA, Metal, ROCm).

📦 2. Using the Python SDK (vieneu)

Integrate VieNeu-TTS into your own software projects.

Quick Install

# Windows (Avoid llama-cpp build errors)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Linux / MacOS
pip install vieneu

Quick Start (main.py)

from vieneu import Vieneu
import os

# Initialization
tts = Vieneu()

# Standard synthesis (uses default voice)
text = "Xin chào, tôi là VieNeu. Tôi có thể giúp bạn đọc sách, làm chatbot thời gian thực, hoặc thậm chí clone giọng nói của bạn."
audio = tts.infer(text=text)
tts.save(audio, "standard_output.wav")
print("💾 Saved synthesis to: standard_output.wav")

For full implementation details, see main.py.

🐳 3. Docker & Remote Server

Deploy VieNeu-TTS as a high-performance API Server (powered by LMDeploy) with a single command.

1. Run with Docker (Recommended)

Requirement: NVIDIA Container Toolkit is required for GPU support.

Start the Server with a Public Tunnel (No port forwarding needed):

docker run --gpus all -p 23333:23333 pnnbao/vieneu-tts:serve --tunnel

Default: The server loads the VieNeu-TTS model for maximum quality.
Tunneling: The Docker image includes a built-in bore tunnel. Check the container logs to find your public address (e.g., bore.pub:31631).

2. Using the SDK (Remote Mode)

Once the server is running, you can connect from anywhere (Colab, Web Apps, etc.) without loading heavy models locally:

from vieneu import Vieneu
import os

# Configuration
REMOTE_API_BASE = 'http://your-server-ip:23333/v1'  # Or bore tunnel URL
REMOTE_MODEL_ID = "pnnbao-ump/VieNeu-TTS"

# Initialization (LIGHTWEIGHT - only loads small codec locally)
tts = Vieneu(mode='remote', api_base=REMOTE_API_BASE, model_name=REMOTE_MODEL_ID)
os.makedirs("outputs", exist_ok=True)

# List remote voices
available_voices = tts.list_preset_voices()
for desc, name in available_voices:
    print(f"   - {desc} (ID: {name})")

# Use specific voice (dynamically select second voice)
if available_voices:
    _, my_voice_id = available_voices[1]
    voice_data = tts.get_preset_voice(my_voice_id)
    audio_spec = tts.infer(text="Chào bạn, tôi đang nói bằng giọng của bác sĩ Tuyên.", voice=voice_data)
    tts.save(audio_spec, f"outputs/remote_{my_voice_id}.wav")
    print(f"💾 Saved synthesis to: outputs/remote_{my_voice_id}.wav")

# Standard synthesis (uses default voice)
text_input = "Chế độ remote giúp tích hợp VieNeu vào ứng dụng Web hoặc App cực nhanh mà không cần GPU tại máy khách."
audio = tts.infer(text=text_input)
tts.save(audio, "outputs/remote_output.wav")
print("💾 Saved remote synthesis to: outputs/remote_output.wav")

# Zero-shot voice cloning (encodes audio locally, sends codes to server)
if os.path.exists("examples/audio_ref/example_ngoc_huyen.wav"):
    cloned_audio = tts.infer(
        text="Đây là giọng nói được clone và xử lý thông qua VieNeu Server.",
        ref_audio="examples/audio_ref/example_ngoc_huyen.wav",
        ref_text="Tác phẩm dự thi bảo đảm tính khoa học, tính đảng, tính chiến đấu, tính định hướng."
    )
    tts.save(cloned_audio, "outputs/remote_cloned_output.wav")
    print("💾 Saved remote cloned voice to: outputs/remote_cloned_output.wav")

For full implementation details, see: main_remote.py

Voice Preset Specification (v1.0)

VieNeu-TTS uses the official vieneu.voice.presets specification to define reusable voice assets. Only voices.json files following this spec are guaranteed to be compatible with VieNeu-TTS SDK ≥ v1.x.

3. Advanced Configuration

Customize the server to run specific versions or your own fine-tuned models.

Run the 0.3B Model (Faster):

docker run --gpus all pnnbao/vieneu-tts:serve --model pnnbao-ump/VieNeu-TTS-0.3B --tunnel

Serve a Local Fine-tuned Model: If you have merged a LoRA adapter, mount your output directory to the container:

# Linux / macOS
docker run --gpus all \
  -v $(pwd)/finetune/output:/workspace/models \
  pnnbao/vieneu-tts:serve \
  --model /workspace/models/merged_model --tunnel

For full implementation details, see: main_remote.py

🎯 4. Custom Models (LoRA, GGUF, Finetune)

VieNeu-TTS allows you to load custom models directly from HuggingFace or local paths via the Web UI.

LoRA Support: Automatically merges LoRA into the base model and accelerates with LMDeploy.
GGUF Support: Runs smoothly on CPU using the llama.cpp backend.
Private Repos: Supports entering an HF Token to access private models.

👉 See the detailed guide at: docs/CUSTOM_MODEL_USAGE.md

🛠️ 5. Fine-tuning Guide

Train VieNeu-TTS on your own voice or custom datasets.

Simple Workflow: Use the train.py script with optimized LoRA configurations.
Documentation: Follow the step-by-step guide in finetune/README.md.
Notebook: Experience it directly on Google Colab via finetune/finetune_VieNeu-TTS.ipynb.

🔬 6. Model Overview (Backbones)

Model	Format	Device	Quality	Speed
VieNeu-TTS	PyTorch	GPU/CPU	⭐⭐⭐⭐⭐	Very Fast with lmdeploy
VieNeu-TTS-0.3B	PyTorch	GPU/CPU	⭐⭐⭐⭐	Ultra Fast (2x)
VieNeu-TTS-q8-gguf	GGUF Q8	CPU/GPU	⭐⭐⭐⭐	Fast
VieNeu-TTS-q4-gguf	GGUF Q4	CPU/GPU	⭐⭐⭐	Very Fast
VieNeu-TTS-0.3B-q8-gguf	GGUF Q8	CPU/GPU	⭐⭐⭐⭐	Ultra Fast (1.5x)
VieNeu-TTS-0.3B-q4-gguf	GGUF Q4	CPU/GPU	⭐⭐⭐	Extreme Speed (2x)

🔬 Model Details

Training Data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for all versions).
Audio Codec: NeuCodec (Torch implementation; ONNX & quantized variants supported).
Context Window: 2,048 tokens shared by prompt text and speech tokens.
Output Watermark: Enabled by default.

🐋 7. Deployment with Docker (Compose)

Deploy quickly without manual environment setup.

Note: Docker deployment currently supports GPU only. For CPU usage, please follow the Installation & Web UI section to install from source.

# Run with GPU (Requires NVIDIA Container Toolkit)
docker compose --profile gpu up

Check docs/Deploy.md for more details.

📚 References

Dataset: VieNeu-TTS-1000h (Hugging Face)
Model 0.5B: pnnbao-ump/VieNeu-TTS
Model 0.3B: pnnbao-ump/VieNeu-TTS-0.3B
LoRA Guide: docs/CUSTOM_MODEL_USAGE.md

🤝 8. Support & Contact

Hugging Face: pnnbao-ump
Discord: Join our community
Facebook: Pham Nguyen Ngoc Bao
Licensing:
- VieNeu-TTS (0.5B): Apache 2.0 (Free to use).
- VieNeu-TTS-0.3B: CC BY-NC 4.0 (Non-commercial).
  - ✅ Free: For students, researchers, and non-profit purposes.
  - ⚠️ Commercial/Enterprise: Contact the author for licensing.

📑 Citation

@misc{vieneutts2026,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

🙏 Acknowledgements

This project builds upon the NeuTTS Air and NeuCodec architectures. Specifically, the VieNeu-TTS (0.5B) model is fine-tuned from NeuTTS Air, while the VieNeu-TTS-0.3B model is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.

Made with ❤️ for the Vietnamese TTS community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.6.1

May 7, 2026

2.6.0

May 7, 2026

2.5.0

May 2, 2026

2.4.3

Apr 2, 2026

2.4.2

Apr 2, 2026

2.4.1

Apr 2, 2026

2.4.0

Apr 2, 2026

2.3.0

Apr 1, 2026

2.2.0

Apr 1, 2026

2.1.3

Apr 1, 2026

2.1.2

Apr 1, 2026

2.1.1

Mar 31, 2026

2.1.0

Mar 31, 2026

2.0.2

Mar 31, 2026

2.0.1

Mar 31, 2026

2.0.0

Mar 31, 2026

1.3.0

Mar 13, 2026

1.2.9

Mar 12, 2026

1.2.8

Mar 12, 2026

1.2.7

Mar 10, 2026

1.2.6

Mar 5, 2026

1.2.5

Mar 4, 2026

1.2.4

Mar 3, 2026

1.2.3

Feb 24, 2026

1.2.2

Feb 23, 2026

1.2.1

Feb 23, 2026

1.2.0

Feb 21, 2026

1.1.9

Feb 21, 2026

1.1.8

Feb 21, 2026

1.1.7

Jan 25, 2026

This version

1.1.6

Jan 16, 2026

1.1.5

Jan 15, 2026

1.1.4

Jan 12, 2026

1.1.3

Jan 12, 2026

1.1.2

Jan 12, 2026

1.1.1

Jan 12, 2026

1.1.0

Jan 12, 2026

1.0.7

Jan 12, 2026

1.0.6

Jan 12, 2026

1.0.4

Jan 12, 2026

1.0.3

Jan 12, 2026

1.0.2

Jan 11, 2026

1.0.1

Jan 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vieneu-1.1.6.tar.gz (4.7 MB view details)

Uploaded Jan 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vieneu-1.1.6-py3-none-any.whl (4.8 MB view details)

Uploaded Jan 16, 2026 Python 3

File details

Details for the file vieneu-1.1.6.tar.gz.

File metadata

Download URL: vieneu-1.1.6.tar.gz
Upload date: Jan 16, 2026
Size: 4.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vieneu-1.1.6.tar.gz
Algorithm	Hash digest
SHA256	`6c1b69d7ee0b0157d3e4c1bd367f3159b510b6ca679661112f869aa88cc328ce`
MD5	`53786d43be9a71b79bf4e9ac56cd5ca0`
BLAKE2b-256	`667af25603bab6761738e446624eb40f87a29da89116221863831d02f5aa6399`

See more details on using hashes here.

File details

Details for the file vieneu-1.1.6-py3-none-any.whl.

File metadata

Download URL: vieneu-1.1.6-py3-none-any.whl
Upload date: Jan 16, 2026
Size: 4.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vieneu-1.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f991ec379d49310a074dd81da8728e101289b412aafdfd555ceb86a9bc46d626`
MD5	`4ee2a8e483dc4b5b78a3425579ee5efb`
BLAKE2b-256	`a767d54c0e91b21854073467f264acba67c236bf3c922e064b08800d8429673d`

See more details on using hashes here.

vieneu 1.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🦜 VieNeu-TTS

📌 Table of Contents

🦜 1. Installation & Web UI

System Requirements

Installation Steps

⚡ Real-time Streaming (CPU Optimized)

🚀 GGUF GPU Acceleration (Optional)

Windows Users

Linux / macOS Users

📦 2. Using the Python SDK (vieneu)

Quick Install

Quick Start (main.py)

🐳 3. Docker & Remote Server

1. Run with Docker (Recommended)

2. Using the SDK (Remote Mode)

Voice Preset Specification (v1.0)

3. Advanced Configuration

🎯 4. Custom Models (LoRA, GGUF, Finetune)

🛠️ 5. Fine-tuning Guide

🔬 6. Model Overview (Backbones)

🔬 Model Details

🐋 7. Deployment with Docker (Compose)

📚 References

🤝 8. Support & Contact

📑 Citation

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes