Cross-modal data conversion driven asynchronous multi-voice translation system

These details have not been verified by PyPI

Project links

Project description

CopyTalker

CopyTalker is a cross-modal data conversion driven asynchronous multi-voice translation system. It enables real-time speech-to-speech translation with support for multiple languages and voices, utilizing state-of-the-art machine learning models for speech recognition, translation, and synthesis.

Features

Real-time Speech Translation: Instantly translate spoken language to another language with voice output
Multi-language Support: Supports translation between 9 languages including English, Chinese, Japanese, Korean, French, German, Spanish, Russian, and Arabic
Multiple TTS Engines: Kokoro (high-quality neural TTS), Edge TTS (cloud-based), pyttsx3 (offline)
Cross-platform: Full support for macOS (Apple Silicon/Intel), Linux, and Windows
Cross-modal Conversion: Seamless conversion from speech to text to translated speech
Asynchronous Processing: Efficient parallel processing with minimal latency
Simple GUI: Easy-to-use Tkinter graphical interface
Offline Capabilities: Download models for offline usage

Platform Compatibility

Component	macOS (Apple Silicon)	macOS (Intel)	Linux	Windows
STT (faster-whisper)	CPU (float32)	CPU (float32)	CPU / CUDA	CPU / CUDA
Translation (transformers)	MPS accelerated	CPU	CPU / CUDA	CPU / CUDA
TTS - Edge TTS	Supported	Supported	Supported	Supported
TTS - pyttsx3	Supported (NSSpeech)	Supported (NSSpeech)	Supported (espeak)	Supported (SAPI)
TTS - Kokoro	MPS accelerated	CPU	CPU / CUDA	CPU / CUDA
Audio I/O	sounddevice	sounddevice	sounddevice	sounddevice

Note: faster-whisper uses ctranslate2 which does not support Apple MPS. STT automatically uses CPU on macOS. Translation models and Kokoro TTS can leverage Apple Silicon MPS acceleration.

Supported Languages

Code	Language
en	English
zh	Chinese (Simplified)
ja	Japanese
ko	Korean
fr	French
de	German
es	Spanish
ru	Russian
ar	Arabic

Installation

From PyPI (Recommended)

pip install copytalker

This installs CopyTalker with all TTS engines (Kokoro, Edge TTS, pyttsx3, Fish-Speech), PySide6 GUI, and core dependencies.

Python 3.13 users: audioop-lts is automatically installed for pydub compatibility.

With CJK Language Support

For Chinese, Japanese, and Korean language support:

pip install copytalker[cjk]

Or for complete installation with everything:

pip install copytalker[complete]

From Source

git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .

System Dependencies

CopyTalker requires FFmpeg and PortAudio for audio processing:

Ubuntu/Debian:

sudo apt update
sudo apt install -y ffmpeg portaudio19-dev libsndfile1 python3-dev

Fedora:

sudo dnf install -y ffmpeg portaudio-devel python3-devel libsndfile

macOS (with Homebrew):

brew install ffmpeg portaudio libsndfile

Windows:

Download FFmpeg from https://ffmpeg.org/download.html and add to PATH

TTS Engines

Engine	Install	Features
Edge TTS	Default	Microsoft Azure voices, requires internet
pyttsx3	Default	System voices, works offline
Fish-Speech	Default	Voice cloning, 50+ emotion tags, cloud API
Kokoro	`pip install copytalker[kokoro]`	High-quality neural TTS, needs model download

Model Downloads (via GUI Settings)

Whisper (Speech-to-Text):

Model	Size	Speed
tiny	~75 MB	Fastest
base	~145 MB	Fast
small	~465 MB	Balanced
medium	~1.5 GB	Slow
large	~3 GB	Slowest

Translation:

Model	Size	Supports
Helsinki-NLP	~300 MB each	Specific language pairs (faster)
NLLB-200-distilled-600M	~1.2 GB	All 200 languages (fastest)
NLLB-200-distilled-1.3B	~2.6 GB	All 200 languages (balanced)
NLLB-200-1.3B	~2.6 GB	All 200 languages (high quality)
NLLB-200-3.3B	~6.5 GB	All 200 languages (best quality)

TTS Models:

Model	Size	Languages
Kokoro-82M	~330 MB	English, Chinese, Japanese

Optional: CJK Language Processing

For Chinese, Japanese, Korean text processing:

Linux (Ubuntu/Debian):

sudo apt install -y libmecab-dev mecab mecab-ipadic-utf8
pip install copytalker[cjk]

macOS:

brew install mecab
pip install copytalker[cjk]

Troubleshooting

Issue 1: GUI shows "No TTS engine available"

Solution:

pip install --upgrade copytalker

Issue 2: Kokoro TTS connection timeout / model download failed

Kokoro requires downloading ~82MB model from HuggingFace. If connection fails:

# Option 1: Use proxy
export https_proxy=http://127.0.0.1:7897
export http_proxy=http://127.0.0.1:7897

# Option 2: Use HuggingFace mirror (for users in China)
export HF_ENDPOINT=https://hf-mirror.com

# Then run CopyTalker
copytalker --gui

Or use edge-tts which works without model downloads:

copytalker translate --target zh --tts-engine edge-tts

Issue 3: Kokoro TTS cannot generate Chinese/Japanese speech

Solution:

pip install copytalker[cjk]

Issue 4: PyAudio installation fails on macOS

Solution:

# CopyTalker uses sounddevice by default (pre-built binaries), PyAudio not required
brew install portaudio
pip install pyaudio  # only if you need PyAudio backend

Supported Languages

Code	Language	TTS Support
en	English	Kokoro, Edge, pyttsx3, Fish-Speech
zh	Chinese	Kokoro, Edge, Fish-Speech
ja	Japanese	Kokoro, Edge, Fish-Speech
ko	Korean	Edge, Fish-Speech
fr	French	Edge, Fish-Speech
de	German	Edge, Fish-Speech
es	Spanish	Edge, Fish-Speech
ru	Russian	Edge, Fish-Speech
it	Italian	Edge, Fish-Speech
pt	Portuguese	Edge, Fish-Speech
ar	Arabic	Edge, Fish-Speech

Quick Start

Command Line Interface

# Start real-time translation (English to Chinese)
copytalker translate --target zh

# With auto-detection of source language
copytalker translate --source auto --target ja

# Specify TTS voice
copytalker translate --target zh --voice zf_xiaobei

# Use specific TTS engine
copytalker translate --target en --tts-engine edge-tts

# List available voices
copytalker list-voices --language zh

# List supported languages
copytalker list-languages

GUI Mode

# Launch graphical interface
copytalker --gui

# Or use dedicated command
copytalker-gui

Screenshots

Main Interface

Main Interface

The main window provides access to all settings, real-time transcription and translation displays, and control buttons including Start Translation, Stop, and Download Models.

Source Language Selection

Source Language Selection

Select the source language or choose Auto-detect to let Whisper identify the spoken language automatically.

Target Language Selection

Target Language Selection

Choose the target language for translation output.

Voice Selection

Voice Selection

Pick a TTS voice for the target language. Voices change dynamically based on the selected target language and TTS engine.

TTS Engine Selection

TTS Engine Selection

Choose between Kokoro (high-quality neural), Edge TTS (cloud-based), pyttsx3 (offline), or auto (automatic best choice).

Translation Model Selection

Translation Model Selection

Select between Helsinki-NLP (language-pair specific) or NLLB (multilingual, supports all language pairs including ja-zh).

Translation Device Selection

Translation Device Selection

Assign the translation model to CPU or CUDA GPU to balance resources.

TTS Device Selection

TTS Device Selection

Assign the TTS engine to CPU or CUDA GPU independently from the translation model to avoid GPU resource contention.

Python API

from copytalker import AppConfig, TranslationPipeline

# Configure
config = AppConfig()
config.stt.language = "auto"  # Auto-detect source language
config.translation.target_lang = "zh"  # Translate to Chinese
config.tts.engine = "kokoro"  # Use Kokoro TTS
config.tts.voice = "zf_xiaobei"  # Chinese female voice

# Create and start pipeline
pipeline = TranslationPipeline(config)

# Register callbacks for events
def on_transcription(event):
    print(f"Heard: {event.data.text}")

def on_translation(event):
    print(f"Translated: {event.data.translated_text}")

pipeline.register_callback("transcription", on_transcription)
pipeline.register_callback("translation", on_translation)

# Start translation
pipeline.start()

# ... (pipeline runs until stopped)

# Stop
pipeline.stop()

Using Context Manager

from copytalker import AppConfig, TranslationPipeline

config = AppConfig()
config.translation.target_lang = "ja"

with TranslationPipeline(config) as pipeline:
    # Pipeline is running
    input("Press Enter to stop...")
# Pipeline automatically stopped

Model Management

Pre-download Models

# Download Whisper model
copytalker download-models --whisper small

# Download Kokoro TTS model
copytalker download-models --kokoro

# Download all recommended models
copytalker download-models --all

Cache Management

# Show cache info
copytalker cache --info

# Clear all cached models
copytalker cache --clear

# Clear specific model type
copytalker cache --clear whisper

Configuration

CopyTalker can be configured via:

Command-line arguments
Environment variables
Configuration file (~/.config/copytalker/config.yaml)

Environment Variables

Variable	Description	Default
`COPYTALKER_CACHE_DIR`	Model cache directory	`~/.cache/copytalker`
`COPYTALKER_DEVICE`	Compute device (cpu/cuda/auto)	`auto`
`COPYTALKER_CONFIG`	Config file path	`~/.config/copytalker/config.yaml`

Configuration File Example

audio:
  sample_rate: 16000
  vad_aggressiveness: 3

stt:
  model_size: small
  device: auto

translation:
  target_lang: zh

tts:
  engine: kokoro
  voice: zf_xiaobei
  speed: 1.0

debug: false

Architecture

CopyTalker follows a modular pipeline architecture:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Audio Capture  │────▶│  Speech-to-Text │────▶│   Translation   │────▶│  Text-to-Speech │
│    (VAD)        │     │   (Whisper)     │     │ (Helsinki/NLLB) │     │    (Kokoro)     │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘

Audio Capture: Records audio with Voice Activity Detection (WebRTC VAD)
Speech Recognition: Transcribes using Faster-Whisper
Translation: Translates using Helsinki-NLP or NLLB models
Text-to-Speech: Synthesizes using Kokoro, Edge TTS, or pyttsx3

Development

Setup Development Environment

git clone https://github.com/cycleuser/CopyTalker.git
cd CopyTalker
pip install -e .[dev]

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=copytalker

# Run only unit tests
pytest tests/unit/

# Run fast tests only (skip slow)
pytest -m "not slow"

Code Quality

# Format code
black src/copytalker tests
isort src/copytalker tests

# Lint
ruff check src/copytalker

# Type checking
mypy src/copytalker

Requirements

Python 3.9 or higher
FFmpeg
PortAudio (for PyAudio)
Audio input/output capabilities
PyTorch 2.0+ (on macOS: CPU or MPS; on Linux/Windows: CPU or CUDA)

See pyproject.toml for detailed Python package dependencies.

macOS Installation Notes

CopyTalker works on macOS (both Intel and Apple Silicon). On macOS, CUDA is not available, so PyTorch uses CPU or MPS (Apple Silicon) for inference.

If you encounter torch/numpy conflicts on macOS, install PyTorch first:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install copytalker

If PyAudio fails to install on macOS, set the compiler flags:

LDFLAGS="-L$(brew --prefix portaudio)/lib" CFLAGS="-I$(brew --prefix portaudio)/include" pip install pyaudio

Linux Installation Notes

On Linux, PyAudio is compiled from source and requires the PortAudio development headers and a C compiler. Install them before running pip install:

# Ubuntu/Debian
sudo apt install ffmpeg portaudio19-dev python3-dev build-essential

# Fedora
sudo dnf install ffmpeg portaudio-devel python3-devel gcc

Agent Integration (OpenAI Function Calling)

CopyTalker exposes OpenAI-compatible tools for LLM agents:

from copytalker.tools import TOOLS, dispatch

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=TOOLS,
)

result = dispatch(
    tool_call.function.name,
    tool_call.function.arguments,
)

CLI Help

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Acknowledgments

faster-whisper for speech recognition
Helsinki-NLP for translation models
Facebook NLLB for multilingual translation
Kokoro TTS for high-quality neural TTS
Various TTS libraries for voice synthesis

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Mar 11, 2026

0.0.2

Mar 10, 2026

0.0.1

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copytalker-0.0.3.tar.gz (312.6 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

copytalker-0.0.3-py3-none-any.whl (147.3 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file copytalker-0.0.3.tar.gz.

File metadata

Download URL: copytalker-0.0.3.tar.gz
Upload date: Mar 11, 2026
Size: 312.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for copytalker-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9aed1976443e2cd7cbfb533b26b7d27818637879f1b252cac5861ad58e69dec5`
MD5	`58d1b85f66324c68566430f85900f5a8`
BLAKE2b-256	`cd0ca78ed94421eaf9404162697308c7a58d4182f172c68bf969b9ea8326573c`

See more details on using hashes here.

File details

Details for the file copytalker-0.0.3-py3-none-any.whl.

File metadata

Download URL: copytalker-0.0.3-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 147.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for copytalker-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ca7642abbac451e7a6b57f17929032476fbc975357d28f9de26983c84d62596`
MD5	`7e97683c3ddd0c927dd3695c842d94a5`
BLAKE2b-256	`05cd9016001129df491fdee3e4f893ddc1aa3526b84d6806e416b378e9594637`

See more details on using hashes here.

copytalker 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CopyTalker

Features

Platform Compatibility

Supported Languages

Installation

From PyPI (Recommended)

With CJK Language Support

From Source

System Dependencies

TTS Engines

Model Downloads (via GUI Settings)

Optional: CJK Language Processing

Troubleshooting

Supported Languages

Quick Start

Command Line Interface

GUI Mode

Screenshots

Python API

Using Context Manager

Model Management

Pre-download Models

Cache Management

Configuration

Environment Variables

Configuration File Example

Architecture

Development

Setup Development Environment

Running Tests

Code Quality

Requirements

macOS Installation Notes

Linux Installation Notes

Agent Integration (OpenAI Function Calling)

CLI Help

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes