Minimal CosyVoice2 European inference CLI (bundles runtime + Matcha)

These details have not been verified by PyPI

Project links

homepage

Project description

CosyVoice2-EU

Minimal, plug-and-play CosyVoice2 European inference CLI that downloads our model from Hugging Face and runs cross-lingual zero-shot voice cloning TTS. It bundles the required cosyvoice runtime and matcha module so you don't need the full upstream repo.

Currently supports Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.) from the original CosyVoice2, plus our newly added French and German support!

Important Notes

Limited Training Data: This model was fine-tuned on 1,000 hours of French and 1,000 hours of German data. Support and capabilities for these languages may still be limited compared to the original CosyVoice2 languages. Here, we ship a bilingual model for simplicity, but performance may vary between languages and when compared to monolingual model versions.
Prompt Support: You can use prompts by putting your prompt text followed by <|endofprompt|> at the beginning of your text (e.g., "Speak sadly. <|endofprompt|> Your actual text here"). However, prompt support is currently limited and experimental.

Quick Start (CLI)

Install the package:
```
pip install cosyvoice2-eu
```

Run French voice cloning:

cosy2-eu \
  --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \
  --prompt french_speaker.wav \
  --out output_french.wav

Run German voice cloning:

cosy2-eu \
  --text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \
  --prompt german_speaker.wav \
  --out output_german.wav

Use prompts for style control (experimental):

cosy2-eu \
  --text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \
  --prompt german_speaker.wav \
  --out output_cheerful_german.wav

That's it! The first run will automatically download the model from Hugging Face. The model stays in memory between calls for faster subsequent inference.

🎯 Features

Easy Installation: Simple pip install cosyvoice2-eu command
Cross-lingual Voice Cloning: Clone voices across different languages
Multi-language Support:
- Original CosyVoice2: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)
- European Extension: French and German (fine-tuned on 1,500h each)
Model Caching: Model stays in memory between calls for faster inference
Audio Concatenation: Multiple audio segments are automatically concatenated into a single output file
Experimental Prompt Support: Style control using <|endofprompt|> syntax (limited)
Bundled Runtime: No need to install the full upstream CosyVoice2 repository
Hugging Face Integration: Automatic model downloading from Hugging Face
Multiple LLM Backbones: Support for different language model backbones (see below)
Text Frontend Disabled: Text normalization is disabled by default for better multilingual support

📓 Interactive Usage (Python/Notebook)

You can keep the model in memory and call it multiple times without reloads:

from cosyvoice2_eu import load
import torchaudio

# Load once (downloads on first use) and reuse
cosy = load()  # or: load(model_dir="~/.cache/cosyvoice2-eu", repo_id="Luka512/CosyVoice2-0.5B-EU")

# Full synthesis (returns a single waveform)
wav, sr = cosy.tts(
    text="Salut ! Ceci est une démonstration.",
    prompt="/path/to/french_ref.wav",
)
torchaudio.save("out_fr.wav", wav, sr)

# Streaming synthesis (yields chunks)
chunks = []
for chunk in cosy.stream(
    text="Hallo! Dies ist eine Streaming-Demonstration.",
    prompt="/path/to/german_ref.wav",
):
    chunks.append(chunk)
if chunks:
    import torch
    torchaudio.save("out_de_streamed.wav", torch.cat(chunks, dim=1), cosy.sample_rate)

🚀 Upcoming Features

Multiple LLM Backbone Support - Code is ready, models are currently training:

Qwen3 0.6B: Lightweight model for efficient inference
EuroLLM 1.7B Instruct: Specialized European language model
Mistral 7B v0.3: Powerful multilingual capabilities

Currently ships with the original CosyVoice2 "blankEN" backbone and our fine-tuned LM and flow models. New backbones will be available as separate model downloads once training is complete.

📖 Model & Credits

This package uses our CosyVoice2-0.5B-EU model available at: 🤗 Luka512/CosyVoice2-0.5B-EU

Built on CosyVoice2: This project builds upon the excellent CosyVoice2 by FunAudioLLM (Apache 2.0), adapted for European language support with cross‑lingual voice cloning capabilities.

📜 License & Attribution

Package license: Apache License 2.0 (see LICENSE).
Bundled upstream components (licenses included in distribution):
- CosyVoice2 (FunAudioLLM) — Apache 2.0 • see NOTICE and THIRD_PARTY_LICENSES/COSYVOICE_LICENSE.
- Matcha‑TTS (Shivam Mehta) — MIT • see THIRD_PARTY_LICENSES/MATCHA_TTS_LICENSE.
- HiFi‑GAN (via Matcha‑TTS) — MIT • see THIRD_PARTY_LICENSES/MATCHA_HIFIGAN_LICENSE.

Original licenses and attributions are preserved. This package is not affiliated with or endorsed by FunAudioLLM/Alibaba; trademarks and names belong to their owners.

Installation

From PyPI (Recommended)

pip install cosyvoice2-eu

For enhanced English phonemization (optional):

pip install cosyvoice2-eu[piper]

Note: The piper optional dependency requires compilation tools and may fail in some environments (like Google Colab). The package will work without it, using the standard phonemizer as fallback.

If you are on Linux with GPU, ensure you install torch/torchaudio matching your CUDA and have onnxruntime-gpu available. If CPU-only, onnxruntime will be sufficient.

Development Installation

cd standalone_infer
pip install -e .

Usage

French Example:

cosy2-eu \
  --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé. Cette technologie permet de reproduire des voix de manière impressionnante." \
  --prompt french_speaker.wav \
  --out output_french.wav

German Example:

cosy2-eu \
  --text "Hallo! Ich stelle Ihnen CosyVoice 2 vor, ein sehr fortschrittliches Sprachsynthese-System. Diese Technologie kann Stimmen auf beeindruckende Weise reproduzieren." \
  --prompt german_speaker.wav \
  --out output_german.wav

Prompt-based Style Control (Experimental):

cosy2-eu \
  --text "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute? Ich hoffe, Sie haben einen wunderbaren Tag!" \
  --prompt german_speaker.wav \
  --out output_cheerful_german.wav

English/Chinese/Japanese/Korean (Original CosyVoice2 languages):

cosy2-eu \
  --text "Hello! This is CosyVoice 2, demonstrating cross-lingual voice cloning capabilities." \
  --prompt any_speaker.wav \
  --out output_english.wav

First run will download the model assets to ~/.cache/cosyvoice2-eu (configurable via --model-dir). The model stays in memory between calls for faster subsequent inference.

Advanced CLI options: --setting, --stream, --speed, --text-frontend (enable text normalization), --clear-cache (reload model).

Project details

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

This version

0.2.8

Sep 12, 2025

0.2.7

Sep 3, 2025

0.2.6

Sep 3, 2025

0.2.5

Sep 3, 2025

0.1.5

Aug 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosyvoice2_eu-0.2.8.tar.gz (703.8 kB view details)

Uploaded Sep 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cosyvoice2_eu-0.2.8-py3-none-any.whl (752.6 kB view details)

Uploaded Sep 12, 2025 Python 3

File details

Details for the file cosyvoice2_eu-0.2.8.tar.gz.

File metadata

Download URL: cosyvoice2_eu-0.2.8.tar.gz
Upload date: Sep 12, 2025
Size: 703.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for cosyvoice2_eu-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`b672c5e7f437d5ad6e28303e124b0606577fa76d632a188991d2b9fc76d61780`
MD5	`300357780f2c6d893ee11477ff4cd050`
BLAKE2b-256	`e3187f3f8a37d3d9314080ff256dad38ab279cb707adad8d9dd6658bcaf569ff`

See more details on using hashes here.

File details

Details for the file cosyvoice2_eu-0.2.8-py3-none-any.whl.

File metadata

Download URL: cosyvoice2_eu-0.2.8-py3-none-any.whl
Upload date: Sep 12, 2025
Size: 752.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for cosyvoice2_eu-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`155792a31b6f17e45582411a6cd0f5057da7ff9b3aa7508a16d54c5b63ecf511`
MD5	`c747812189642b0f9b00fb3188bdcdf5`
BLAKE2b-256	`dc987c30bf90952d6818aea9cbde27daa0bfae2b45d78db0225a3eb8b5012c1b`

See more details on using hashes here.

cosyvoice2-eu 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CosyVoice2-EU

Important Notes

Quick Start (CLI)

🎯 Features

📓 Interactive Usage (Python/Notebook)

🚀 Upcoming Features

📖 Model & Credits

📜 License & Attribution

Installation

From PyPI (Recommended)

For enhanced English phonemization (optional):

Development Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes