Cross-modal data conversion driven asynchronous multi-voice translation system
Project description
CopyTalker
CopyTalker is a cross-modal data conversion driven asynchronous multi-voice translation system. It enables real-time speech-to-speech translation with support for multiple languages and voices, utilizing state-of-the-art machine learning models for speech recognition, translation, and synthesis.
Features
- Real-time Speech Translation: Instantly translate spoken language to another language with voice output
- Multi-language Support: Supports translation between 9 languages including English, Chinese, Japanese, Korean, French, German, Spanish, Russian, and Arabic
- Multiple TTS Engines: Kokoro (high-quality neural TTS), Edge TTS (cloud-based), pyttsx3 (offline)
- Cross-platform: Full support for macOS (Apple Silicon/Intel), Linux, and Windows
- Cross-modal Conversion: Seamless conversion from speech to text to translated speech
- Asynchronous Processing: Efficient parallel processing with minimal latency
- Simple GUI: Easy-to-use Tkinter graphical interface
- Offline Capabilities: Download models for offline usage
Platform Compatibility
| Component | macOS (Apple Silicon) | macOS (Intel) | Linux | Windows |
|---|---|---|---|---|
| STT (faster-whisper) | CPU (float32) | CPU (float32) | CPU / CUDA | CPU / CUDA |
| Translation (transformers) | MPS accelerated | CPU | CPU / CUDA | CPU / CUDA |
| TTS - Edge TTS | Supported | Supported | Supported | Supported |
| TTS - pyttsx3 | Supported (NSSpeech) | Supported (NSSpeech) | Supported (espeak) | Supported (SAPI) |
| TTS - Kokoro | MPS accelerated | CPU | CPU / CUDA | CPU / CUDA |
| Audio I/O | sounddevice | sounddevice | sounddevice | sounddevice |
Note: faster-whisper uses ctranslate2 which does not support Apple MPS. STT automatically uses CPU on macOS. Translation models and Kokoro TTS can leverage Apple Silicon MPS acceleration.
Supported Languages
| Code | Language |
|---|---|
| en | English |
| zh | Chinese (Simplified) |
| ja | Japanese |
| ko | Korean |
| fr | French |
| de | German |
| es | Spanish |
| ru | Russian |
| ar | Arabic |
Installation
From PyPI (Recommended)
pip install copytalker
This installs CopyTalker with all TTS engines (Kokoro, Edge TTS, pyttsx3, Fish-Speech), PySide6 GUI, and core dependencies.
Python 3.13 users:
audioop-ltsis automatically installed for pydub compatibility.
With CJK Language Support
For Chinese, Japanese, and Korean language support:
pip install copytalker[cjk]
Or for complete installation with everything:
pip install copytalker[complete]
From Source
git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .
System Dependencies
CopyTalker requires FFmpeg and PortAudio for audio processing:
Ubuntu/Debian:
sudo apt update
sudo apt install -y ffmpeg portaudio19-dev libsndfile1 python3-dev
Fedora:
sudo dnf install -y ffmpeg portaudio-devel python3-devel libsndfile
macOS (with Homebrew):
brew install ffmpeg portaudio libsndfile
Windows:
- Download FFmpeg from https://ffmpeg.org/download.html and add to PATH
TTS Engines
| Engine | Install | Features |
|---|---|---|
| Edge TTS | Default | Microsoft Azure voices, requires internet |
| pyttsx3 | Default | System voices, works offline |
| Fish-Speech | Default | Voice cloning, 50+ emotion tags, cloud API |
| Kokoro | pip install copytalker[kokoro] |
High-quality neural TTS, needs model download |
Model Downloads (via GUI Settings)
Whisper (Speech-to-Text):
| Model | Size | Speed |
|---|---|---|
| tiny | ~75 MB | Fastest |
| base | ~145 MB | Fast |
| small | ~465 MB | Balanced |
| medium | ~1.5 GB | Slow |
| large | ~3 GB | Slowest |
Translation:
| Model | Size | Supports |
|---|---|---|
| Helsinki-NLP | ~300 MB each | Specific language pairs (faster) |
| NLLB-200-distilled-600M | ~1.2 GB | All 200 languages (fastest) |
| NLLB-200-distilled-1.3B | ~2.6 GB | All 200 languages (balanced) |
| NLLB-200-1.3B | ~2.6 GB | All 200 languages (high quality) |
| NLLB-200-3.3B | ~6.5 GB | All 200 languages (best quality) |
TTS Models:
| Model | Size | Languages |
|---|---|---|
| Kokoro-82M | ~330 MB | English, Chinese, Japanese |
Optional: CJK Language Processing
For Chinese, Japanese, Korean text processing:
Linux (Ubuntu/Debian):
sudo apt install -y libmecab-dev mecab mecab-ipadic-utf8
pip install copytalker[cjk]
macOS:
brew install mecab
pip install copytalker[cjk]
Troubleshooting
Issue 1: GUI shows "No TTS engine available"
Solution:
pip install --upgrade copytalker
Issue 2: Kokoro TTS connection timeout / model download failed
Kokoro requires downloading ~82MB model from HuggingFace. If connection fails:
# Option 1: Use proxy
export https_proxy=http://127.0.0.1:7897
export http_proxy=http://127.0.0.1:7897
# Option 2: Use HuggingFace mirror (for users in China)
export HF_ENDPOINT=https://hf-mirror.com
# Then run CopyTalker
copytalker --gui
Or use edge-tts which works without model downloads:
copytalker translate --target zh --tts-engine edge-tts
Issue 3: Kokoro TTS cannot generate Chinese/Japanese speech
Solution:
pip install copytalker[cjk]
Issue 4: PyAudio installation fails on macOS
Solution:
# CopyTalker uses sounddevice by default (pre-built binaries), PyAudio not required
brew install portaudio
pip install pyaudio # only if you need PyAudio backend
Supported Languages
| Code | Language | TTS Support |
|---|---|---|
| en | English | Kokoro, Edge, pyttsx3, Fish-Speech |
| zh | Chinese | Kokoro, Edge, Fish-Speech |
| ja | Japanese | Kokoro, Edge, Fish-Speech |
| ko | Korean | Edge, Fish-Speech |
| fr | French | Edge, Fish-Speech |
| de | German | Edge, Fish-Speech |
| es | Spanish | Edge, Fish-Speech |
| ru | Russian | Edge, Fish-Speech |
| it | Italian | Edge, Fish-Speech |
| pt | Portuguese | Edge, Fish-Speech |
| ar | Arabic | Edge, Fish-Speech |
Quick Start
Command Line Interface
# Start real-time translation (English to Chinese)
copytalker translate --target zh
# With auto-detection of source language
copytalker translate --source auto --target ja
# Specify TTS voice
copytalker translate --target zh --voice zf_xiaobei
# Use specific TTS engine
copytalker translate --target en --tts-engine edge-tts
# List available voices
copytalker list-voices --language zh
# List supported languages
copytalker list-languages
GUI Mode
# Launch graphical interface
copytalker --gui
# Or use dedicated command
copytalker-gui
Screenshots
Main Interface
The main window provides access to all settings, real-time transcription and translation displays, and control buttons including Start Translation, Stop, and Download Models.
Source Language Selection
Select the source language or choose Auto-detect to let Whisper identify the spoken language automatically.
Target Language Selection
Choose the target language for translation output.
Voice Selection
Pick a TTS voice for the target language. Voices change dynamically based on the selected target language and TTS engine.
TTS Engine Selection
Choose between Kokoro (high-quality neural), Edge TTS (cloud-based), pyttsx3 (offline), or auto (automatic best choice).
Translation Model Selection
Select between Helsinki-NLP (language-pair specific) or NLLB (multilingual, supports all language pairs including ja-zh).
Translation Device Selection
Assign the translation model to CPU or CUDA GPU to balance resources.
TTS Device Selection
Assign the TTS engine to CPU or CUDA GPU independently from the translation model to avoid GPU resource contention.
Python API
from copytalker import AppConfig, TranslationPipeline
# Configure
config = AppConfig()
config.stt.language = "auto" # Auto-detect source language
config.translation.target_lang = "zh" # Translate to Chinese
config.tts.engine = "kokoro" # Use Kokoro TTS
config.tts.voice = "zf_xiaobei" # Chinese female voice
# Create and start pipeline
pipeline = TranslationPipeline(config)
# Register callbacks for events
def on_transcription(event):
print(f"Heard: {event.data.text}")
def on_translation(event):
print(f"Translated: {event.data.translated_text}")
pipeline.register_callback("transcription", on_transcription)
pipeline.register_callback("translation", on_translation)
# Start translation
pipeline.start()
# ... (pipeline runs until stopped)
# Stop
pipeline.stop()
Using Context Manager
from copytalker import AppConfig, TranslationPipeline
config = AppConfig()
config.translation.target_lang = "ja"
with TranslationPipeline(config) as pipeline:
# Pipeline is running
input("Press Enter to stop...")
# Pipeline automatically stopped
Model Management
Pre-download Models
# Download Whisper model
copytalker download-models --whisper small
# Download Kokoro TTS model
copytalker download-models --kokoro
# Download all recommended models
copytalker download-models --all
Cache Management
# Show cache info
copytalker cache --info
# Clear all cached models
copytalker cache --clear
# Clear specific model type
copytalker cache --clear whisper
Configuration
CopyTalker can be configured via:
- Command-line arguments
- Environment variables
- Configuration file (
~/.config/copytalker/config.yaml)
Environment Variables
| Variable | Description | Default |
|---|---|---|
COPYTALKER_CACHE_DIR |
Model cache directory | ~/.cache/copytalker |
COPYTALKER_DEVICE |
Compute device (cpu/cuda/auto) | auto |
COPYTALKER_CONFIG |
Config file path | ~/.config/copytalker/config.yaml |
Configuration File Example
audio:
sample_rate: 16000
vad_aggressiveness: 3
stt:
model_size: small
device: auto
translation:
target_lang: zh
tts:
engine: kokoro
voice: zf_xiaobei
speed: 1.0
debug: false
Architecture
CopyTalker follows a modular pipeline architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Audio Capture │────▶│ Speech-to-Text │────▶│ Translation │────▶│ Text-to-Speech │
│ (VAD) │ │ (Whisper) │ │ (Helsinki/NLLB) │ │ (Kokoro) │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
- Audio Capture: Records audio with Voice Activity Detection (WebRTC VAD)
- Speech Recognition: Transcribes using Faster-Whisper
- Translation: Translates using Helsinki-NLP or NLLB models
- Text-to-Speech: Synthesizes using Kokoro, Edge TTS, or pyttsx3
Development
Setup Development Environment
git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .[dev]
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=copytalker
# Run only unit tests
pytest tests/unit/
# Run fast tests only (skip slow)
pytest -m "not slow"
Code Quality
# Format code
black src/copytalker tests
isort src/copytalker tests
# Lint
ruff check src/copytalker
# Type checking
mypy src/copytalker
Requirements
- Python 3.9 or higher
- FFmpeg
- PortAudio (for PyAudio)
- Audio input/output capabilities
- PyTorch 2.0+ (on macOS: CPU or MPS; on Linux/Windows: CPU or CUDA)
See pyproject.toml for detailed Python package dependencies.
macOS Installation Notes
CopyTalker works on macOS (both Intel and Apple Silicon). On macOS, CUDA is not available, so PyTorch uses CPU or MPS (Apple Silicon) for inference.
If you encounter torch/numpy conflicts on macOS, install PyTorch first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install copytalker
If PyAudio fails to install on macOS, set the compiler flags:
LDFLAGS="-L$(brew --prefix portaudio)/lib" CFLAGS="-I$(brew --prefix portaudio)/include" pip install pyaudio
Linux Installation Notes
On Linux, PyAudio is compiled from source and requires the PortAudio development headers and a C compiler. Install them before running pip install:
# Ubuntu/Debian
sudo apt install ffmpeg portaudio19-dev python3-dev build-essential
# Fedora
sudo dnf install ffmpeg portaudio-devel python3-devel gcc
Agent Integration (OpenAI Function Calling)
CopyTalker exposes OpenAI-compatible tools for LLM agents:
from copytalker.tools import TOOLS, dispatch
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
result = dispatch(
tool_call.function.name,
tool_call.function.arguments,
)
CLI Help
License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Acknowledgments
- faster-whisper for speech recognition
- Helsinki-NLP for translation models
- Facebook NLLB for multilingual translation
- Kokoro TTS for high-quality neural TTS
- Various TTS libraries for voice synthesis
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file copytalker-0.0.3.tar.gz.
File metadata
- Download URL: copytalker-0.0.3.tar.gz
- Upload date:
- Size: 312.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9aed1976443e2cd7cbfb533b26b7d27818637879f1b252cac5861ad58e69dec5
|
|
| MD5 |
58d1b85f66324c68566430f85900f5a8
|
|
| BLAKE2b-256 |
cd0ca78ed94421eaf9404162697308c7a58d4182f172c68bf969b9ea8326573c
|
File details
Details for the file copytalker-0.0.3-py3-none-any.whl.
File metadata
- Download URL: copytalker-0.0.3-py3-none-any.whl
- Upload date:
- Size: 147.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ca7642abbac451e7a6b57f17929032476fbc975357d28f9de26983c84d62596
|
|
| MD5 |
7e97683c3ddd0c927dd3695c842d94a5
|
|
| BLAKE2b-256 |
05cd9016001129df491fdee3e4f893ddc1aa3526b84d6806e416b378e9594637
|