Skip to main content

Advanced on-device Vietnamese TTS with instant voice cloning

Project description

🦜 VieNeu-TTS

Awesome Discord

Open In Colab Hugging Face 0.5B Hugging Face 0.3B

VieNeu-TTS UI

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

[!TIP] Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.

This project features two core architectures trained on the VieNeu-TTS-1000h dataset:

  • VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
  • VieNeu-TTS-0.3B: A specialized model trained from scratch using the VieNeu-TTS-1000h dataset, delivering 2x faster inference and ultra-low latency.

These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:

  • Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
  • Code-switching support: Seamless transitions between Vietnamese and English
  • Better voice cloning: Higher fidelity and speaker consistency
  • Real-time synthesis: 24 kHz waveform generation on CPU or GPU
  • Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo


📌 Table of Contents

  1. 📦 Using the Python SDK
  2. 🐳 Docker & Remote Server
  3. 🎯 Custom Models
  4. 🛠️ Fine-tuning Guide
  5. 🔬 Model Overview
  6. 🤝 Support & Contact

📦 1. Using the Python SDK (vieneu)

Integrate VieNeu-TTS into your own software projects.

Quick Install

# Option 1: Recommended (Automatically handles hardware acceleration)
uv pip install vieneu

# Option 2: Standard pip (Manual index required for optimization)
# Windows (CPU optimized)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# macOS (Metal GPU accelerated)
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/

# Linux / Generic
pip install vieneu

Quick Start (main.py)

from vieneu import Vieneu
import os

# Initialization
tts = Vieneu()

# Standard synthesis (uses default voice)
text = "Xin chào, tôi là VieNeu. Tôi có thể giúp bạn đọc sách, làm chatbot thời gian thực, hoặc thậm chí clone giọng nói của bạn."
audio = tts.infer(text=text)
tts.save(audio, "standard_output.wav")
print("💾 Saved synthesis to: standard_output.wav")

For full implementation details, see examples/main.py.


🐳 2. Docker & Remote Server

Deploy VieNeu-TTS as a high-performance API Server (powered by LMDeploy) with a single command.

1. Run with Docker (Recommended)

Requirement: NVIDIA Container Toolkit is required for GPU support.

Start the Server with a Public Tunnel (No port forwarding needed):

docker run --gpus all -p 23333:23333 pnnbao/vieneu-tts:serve --tunnel
  • Default: The server loads the VieNeu-TTS model for maximum quality.
  • Tunneling: The Docker image includes a built-in bore tunnel. Check the container logs to find your public address (e.g., bore.pub:31631).

2. Using the SDK (Remote Mode)

Once the server is running, you can connect from anywhere (Colab, Web Apps, etc.) without loading heavy models locally:

from vieneu import Vieneu
import os

# Configuration
REMOTE_API_BASE = 'http://your-server-ip:23333/v1'  # Or bore tunnel URL
REMOTE_MODEL_ID = "pnnbao-ump/VieNeu-TTS"

# Initialization (LIGHTWEIGHT - only loads small codec locally)
tts = Vieneu(mode='remote', api_base=REMOTE_API_BASE, model_name=REMOTE_MODEL_ID)
os.makedirs("outputs", exist_ok=True)

# List remote voices
available_voices = tts.list_preset_voices()
for desc, name in available_voices:
    print(f"   - {desc} (ID: {name})")

# Use specific voice (dynamically select second voice)
if available_voices:
    _, my_voice_id = available_voices[1]
    voice_data = tts.get_preset_voice(my_voice_id)
    audio_spec = tts.infer(text="Chào bạn, tôi đang nói bằng giọng của bác sĩ Tuyên.", voice=voice_data)
    tts.save(audio_spec, f"outputs/remote_{my_voice_id}.wav")
    print(f"💾 Saved synthesis to: outputs/remote_{my_voice_id}.wav")

# Standard synthesis (uses default voice)
text_input = "Chế độ remote giúp tích hợp VieNeu vào ứng dụng Web hoặc App cực nhanh mà không cần GPU tại máy khách."
audio = tts.infer(text=text_input)
tts.save(audio, "outputs/remote_output.wav")
print("💾 Saved remote synthesis to: outputs/remote_output.wav")

# Zero-shot voice cloning (encodes audio locally, sends codes to server)
if os.path.exists("examples/audio_ref/example_ngoc_huyen.wav"):
    cloned_audio = tts.infer(
        text="Đây là giọng nói được clone và xử lý thông qua VieNeu Server.",
        ref_audio="examples/audio_ref/example_ngoc_huyen.wav",
        ref_text="Tác phẩm dự thi bảo đảm tính khoa học, tính đảng, tính chiến đấu, tính định hướng."
    )
    tts.save(cloned_audio, "outputs/remote_cloned_output.wav")
    print("💾 Saved remote cloned voice to: outputs/remote_cloned_output.wav")

For full implementation details, see: examples/main_remote.py

Voice Preset Specification (v1.0)

VieNeu-TTS uses the official vieneu.voice.presets specification to define reusable voice assets. Only voices.json files following this spec are guaranteed to be compatible with VieNeu-TTS SDK ≥ v1.x.

3. Advanced Configuration

Customize the server to run specific versions or your own fine-tuned models.

Run the 0.3B Model (Faster):

docker run --gpus all pnnbao/vieneu-tts:serve --model pnnbao-ump/VieNeu-TTS-0.3B --tunnel

Serve a Local Fine-tuned Model: If you have merged a LoRA adapter, mount your output directory to the container:

# Linux / macOS
docker run --gpus all \
  -v $(pwd)/finetune/output:/workspace/models \
  pnnbao/vieneu-tts:serve \
  --model /workspace/models/merged_model --tunnel

For full implementation details, see: main_remote.py


🎯 3. Custom Models (LoRA, GGUF, Finetune)

VieNeu-TTS allows you to load custom models directly from HuggingFace or local paths via the Web UI.

👉 See the detailed guide at: docs/CUSTOM_MODEL_USAGE.md


🛠️ 4. Fine-tuning Guide

Train VieNeu-TTS on your own voice or custom datasets.

  • Simple Workflow: Use the train.py script with optimized LoRA configurations.
  • Documentation: Follow the step-by-step guide in finetune/README.md.
  • Notebook: Experience it directly on Google Colab via finetune/finetune_VieNeu-TTS.ipynb.

🔬 5. Model Overview (Backbones)

Model Format Device Quality Speed
VieNeu-TTS PyTorch GPU/CPU ⭐⭐⭐⭐⭐ Very Fast with lmdeploy
VieNeu-TTS-0.3B PyTorch GPU/CPU ⭐⭐⭐⭐ Ultra Fast (2x)
VieNeu-TTS-q8-gguf GGUF Q8 CPU/GPU ⭐⭐⭐⭐ Fast
VieNeu-TTS-q4-gguf GGUF Q4 CPU/GPU ⭐⭐⭐ Very Fast
VieNeu-TTS-0.3B-q8-gguf GGUF Q8 CPU/GPU ⭐⭐⭐⭐ Ultra Fast (1.5x)
VieNeu-TTS-0.3B-q4-gguf GGUF Q4 CPU/GPU ⭐⭐⭐ Extreme Speed (2x)

🔬 Model Details

  • Training Data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for all versions).
  • Audio Codec: NeuCodec (Torch implementation; ONNX & quantized variants supported).
  • Context Window: 2,048 tokens shared by prompt text and speech tokens.
  • Output Watermark: Enabled by default.

📚 References


🤝 6. Support & Contact

  • Hugging Face: pnnbao-ump
  • Discord: Join our community
  • Facebook: Pham Nguyen Ngoc Bao
  • Licensing:
    • VieNeu-TTS (0.5B): Apache 2.0 (Free to use).
    • VieNeu-TTS-0.3B: CC BY-NC 4.0 (Non-commercial).
      • Free: For students, researchers, and non-profit purposes.
      • ⚠️ Commercial/Enterprise: Contact the author for licensing.

📑 Citation

@misc{vieneutts2026,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

🙏 Acknowledgements

This project builds upon the NeuTTS Air and NeuCodec architectures. Specifically, the VieNeu-TTS (0.5B) model is fine-tuned from NeuTTS Air, while the VieNeu-TTS-0.3B model is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.


Made with ❤️ for the Vietnamese TTS community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vieneu-1.2.4.tar.gz (13.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vieneu-1.2.4-py3-none-any.whl (13.0 MB view details)

Uploaded Python 3

File details

Details for the file vieneu-1.2.4.tar.gz.

File metadata

  • Download URL: vieneu-1.2.4.tar.gz
  • Upload date:
  • Size: 13.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for vieneu-1.2.4.tar.gz
Algorithm Hash digest
SHA256 458ce8c3238fbcf5e5d79c41621c5321bf97f50ea39a9c697638647cc83c1ebd
MD5 9ad4282501a0da099ad4f708e0fc55e7
BLAKE2b-256 bc7ba4905b33fc46f7ffb93ab148874ea2d2171b5ed071bd146f422c0254b4ae

See more details on using hashes here.

File details

Details for the file vieneu-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: vieneu-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 13.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for vieneu-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 65d9db29252af7a3f146dddcfadbdea5dfd91036ae7b87e9a316063cc3615708
MD5 5293866acf1e188b9a4b714e63bdbbaf
BLAKE2b-256 21aebbad021f574a55cce8128ab173b5152309bb6d5f983fe5ad8f6182269579

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page