Minimal CosyVoice2 European inference CLI (bundles runtime + Matcha)
Project description
CosyVoice2-EU
Minimal, plug-and-play CosyVoice2 European inference CLI that downloads our model from Hugging Face and runs cross-lingual zero-shot voice cloning TTS. It bundles the required cosyvoice runtime and matcha module so you don't need the full upstream repo.
Currently supports Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.) from the original CosyVoice2, plus our newly added French and German support!
Important Notes
- Limited Training Data: This model was fine-tuned on 1,000 hours of French and 1,000 hours of German data. Support and capabilities for these languages may still be limited compared to the original CosyVoice2 languages. Here, we ship a bilingual model for simplicity, but performance may vary between languages and when compared to monolingual model versions.
- Prompt Support: You can use prompts by putting your prompt text followed by
<|endofprompt|>at the beginning of your text (e.g.,"Speak sadly. <|endofprompt|> Your actual text here"). However, prompt support is currently limited and experimental.
Quick Start (CLI)
-
Install the package:
pip install cosyvoice2-eu
-
Run French voice cloning:
cosy2-eu \ --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \ --prompt french_speaker.wav \ --out output_french.wav
-
Run German voice cloning:
cosy2-eu \ --text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \ --prompt german_speaker.wav \ --out output_german.wav
-
Use prompts for style control (experimental):
cosy2-eu \ --text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \ --prompt german_speaker.wav \ --out output_cheerful_german.wav
That's it! The first run will automatically download the model from Hugging Face. The model stays in memory between calls for faster subsequent inference.
🎯 Features
- Easy Installation: Simple
pip install cosyvoice2-eucommand - Cross-lingual Voice Cloning: Clone voices across different languages
- Multi-language Support:
- Original CosyVoice2: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)
- European Extension: French and German (fine-tuned on 1,500h each)
- Model Caching: Model stays in memory between calls for faster inference
- Audio Concatenation: Multiple audio segments are automatically concatenated into a single output file
- Experimental Prompt Support: Style control using
<|endofprompt|>syntax (limited) - Bundled Runtime: No need to install the full upstream CosyVoice2 repository
- Hugging Face Integration: Automatic model downloading from Hugging Face
- Multiple LLM Backbones: Support for different language model backbones (see below)
- Text Frontend Disabled: Text normalization is disabled by default for better multilingual support
📓 Interactive Usage (Python/Notebook)
You can keep the model in memory and call it multiple times without reloads:
from cosyvoice2_eu import load
import torchaudio
# Load once (downloads on first use) and reuse
cosy = load() # or: load(model_dir="~/.cache/cosyvoice2-eu", repo_id="Luka512/CosyVoice2-0.5B-EU")
# Full synthesis (returns a single waveform)
wav, sr = cosy.tts(
text="Salut ! Ceci est une démonstration.",
prompt="/path/to/french_ref.wav",
)
torchaudio.save("out_fr.wav", wav, sr)
# Streaming synthesis (yields chunks)
chunks = []
for chunk in cosy.stream(
text="Hallo! Dies ist eine Streaming-Demonstration.",
prompt="/path/to/german_ref.wav",
):
chunks.append(chunk)
if chunks:
import torch
torchaudio.save("out_de_streamed.wav", torch.cat(chunks, dim=1), cosy.sample_rate)
🚀 Upcoming Features
Multiple LLM Backbone Support - Code is ready, models are currently training:
- Qwen3 0.6B: Lightweight model for efficient inference
- EuroLLM 1.7B Instruct: Specialized European language model
- Mistral 7B v0.3: Powerful multilingual capabilities
Currently ships with the original CosyVoice2 "blankEN" backbone and our fine-tuned LM and flow models. New backbones will be available as separate model downloads once training is complete.
📖 Model & Credits
This package uses our CosyVoice2-0.5B-EU model available at: 🤗 Luka512/CosyVoice2-0.5B-EU
Built on CosyVoice2: This project builds upon the excellent CosyVoice2 by FunAudioLLM (Apache 2.0), adapted for European language support with cross‑lingual voice cloning capabilities.
📜 License & Attribution
- Package license: Apache License 2.0 (see
LICENSE). - Bundled upstream components (licenses included in distribution):
- CosyVoice2 (FunAudioLLM) — Apache 2.0 • see
NOTICEandTHIRD_PARTY_LICENSES/COSYVOICE_LICENSE. - Matcha‑TTS (Shivam Mehta) — MIT • see
THIRD_PARTY_LICENSES/MATCHA_TTS_LICENSE. - HiFi‑GAN (via Matcha‑TTS) — MIT • see
THIRD_PARTY_LICENSES/MATCHA_HIFIGAN_LICENSE.
- CosyVoice2 (FunAudioLLM) — Apache 2.0 • see
Original licenses and attributions are preserved. This package is not affiliated with or endorsed by FunAudioLLM/Alibaba; trademarks and names belong to their owners.
Installation
From PyPI (Recommended)
pip install cosyvoice2-eu
For enhanced English phonemization (optional):
pip install cosyvoice2-eu[piper]
Note: The piper optional dependency requires compilation tools and may fail in some environments (like Google Colab). The package will work without it, using the standard phonemizer as fallback.
If you are on Linux with GPU, ensure you install torch/torchaudio matching your CUDA and have onnxruntime-gpu available. If CPU-only, onnxruntime will be sufficient.
Development Installation
cd standalone_infer
pip install -e .
Usage
French Example:
cosy2-eu \
--text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \
--prompt french_speaker.wav \
--out output_french.wav
German Example:
cosy2-eu \
--text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \
--prompt german_speaker.wav \
--out output_german.wav
Prompt-based Style Control (Experimental):
cosy2-eu \
--text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \
--prompt german_speaker.wav \
--out output_cheerful_german.wav
English/Chinese/Japanese/Korean (Original CosyVoice2 languages):
cosy2-eu \
--text "Hello! This is CosyVoice 2, demonstrating cross-lingual voice cloning capabilities." \
--prompt any_speaker.wav \
--out output_english.wav
First run will download the model assets to ~/.cache/cosyvoice2-eu (configurable via --model-dir). The model stays in memory between calls for faster subsequent inference.
Advanced CLI options: --setting, --stream, --speed, --text-frontend (enable text normalization), --clear-cache (reload model).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosyvoice2_eu-0.2.8.tar.gz.
File metadata
- Download URL: cosyvoice2_eu-0.2.8.tar.gz
- Upload date:
- Size: 703.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b672c5e7f437d5ad6e28303e124b0606577fa76d632a188991d2b9fc76d61780
|
|
| MD5 |
300357780f2c6d893ee11477ff4cd050
|
|
| BLAKE2b-256 |
e3187f3f8a37d3d9314080ff256dad38ab279cb707adad8d9dd6658bcaf569ff
|
File details
Details for the file cosyvoice2_eu-0.2.8-py3-none-any.whl.
File metadata
- Download URL: cosyvoice2_eu-0.2.8-py3-none-any.whl
- Upload date:
- Size: 752.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
155792a31b6f17e45582411a6cd0f5057da7ff9b3aa7508a16d54c5b63ecf511
|
|
| MD5 |
c747812189642b0f9b00fb3188bdcdf5
|
|
| BLAKE2b-256 |
dc987c30bf90952d6818aea9cbde27daa0bfae2b45d78db0225a3eb8b5012c1b
|