Open-source video dubbing — translate any video into 33 languages with native-sounding voice-over and synced subtitles.

These details have not been verified by PyPI

Project links

Project description

🎻 Violin

Open-source video dubbing — translate any video into 33 languages with natural-sounding voice-over and synced subtitles.

🌐 Live demo · 📜 MIT License

Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.

Available as a CLI, a FastAPI web app, and a Claude Code skill.

✨ Features

33 target languages with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
In-video Q&A — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
Natural-language voice picker — describe the voice you want, an LLM picks from the catalog
6 style profiles (experimental) — standard / kids / academic / casual / storyteller / news
Pluggable stack — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML

🚀 Quick start

Try it without installing anything

The live demo runs at https://violin-ai.com — drop a short clip in, get a dubbed video out in a few minutes.

Run locally

Requires Python 3.13+ and ffmpeg on PATH.

git clone https://github.com/shang-zhu/violin.git
cd violin
uv sync
cp .env.example .env          # then fill in TOGETHER_API_KEY (get one at https://api.together.ai)

Three ways to use it:

1. CLI — translate one file:

uv run main.py assets/demo_en.mp4.mp4 assets/demo_en_zh.mp4 --language Chinese

2. Web app — full REST API + browser UI:

uv run run_api.py
# → http://127.0.0.1:8000           (browser UI)
# → http://127.0.0.1:8000/docs      (interactive API docs)

3. Claude Code skill — invoke from any Claude Code session:

cp -r .claude/skills/video-translator ~/.claude/skills/
claude
> please use the violin skill to translate assets/demo_en.mp4 into Chinese

🎬 How Violin works

Video
  │
  ├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
  │
  ├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
  │
  ├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
  │
  ├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
  │
  └─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
                                    concat with freeze-frame fallback,
                                    single-pass AAC encode the audio track,
                                    write output mp4 + optional SRT

Key engineering decisions worth a look if you're forking:

pipeline/transcriber.py — uses Whisper's word-level timestamps to split into precise sentence boundaries. Has a hallucination filter that re-uses Whisper's own no_speech_prob segment metadata (no hand-tuned heuristics).
pipeline/merger.py — concatenates speed-adjusted video chunks, but builds the audio track once from concatenated PCM and encodes AAC at the end. This is the difference between "subtitles drift 1–2 s by the end of an 8 min video" and "perfectly synced throughout."
pipeline/tts_*.py — Cartesia + ElevenLabs backends share an interface. ElevenLabs side ships 21 premade voices (multilingual via eleven_v3) plus 15 hand-picked native-speaker voices from the Voice Library.

⚙️ Configuration

All defaults live in config/default.yaml. Override with --config my.yaml (only the keys you want to change need to appear in the override file — values deep-merge).

Switch providers

# config/default.yaml — pick the stack you want
models:
  transcription:
    provider: together                  # together | openai
    model: openai/whisper-large-v3      # together → openai/whisper-large-v3 | openai → whisper-1
  translation:
    provider: together                  # together | openai
    model: deepseek-ai/DeepSeek-V4-Pro  # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
  tts:
    provider: together                  # together | elevenlabs | openai
    model: cartesia/sonic-3             # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd

Production overrides

A starter config/prod.yaml is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included Dockerfile + docker-compose.yml + Caddyfile are how the live demo is hosted — docker compose up -d --build after filling .env is enough to put a copy of Violin behind auto-HTTPS on any Docker host.

Environment variables

Variable	When required	Description
`TOGETHER_API_KEY`	Recommended — covers every stage with the default config	Together AI API key
`OPENAI_API_KEY`	Any stage uses `provider: openai`	Covers `whisper-1`, GPT models, and `tts-1`
`ELEVENLABS_API_KEY`	TTS uses `provider: elevenlabs`	ElevenLabs API key
`CORS_ORIGINS`	Optional	Comma-separated allowed origins (default: `*`)

You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on openai) work too — OPENAI_API_KEY alone is enough. Same idea for ElevenLabs.

🎭 Style profiles

Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use --style <name> on the CLI or pass style in API requests.

Style	Tone	TTS speed	Emotion
`standard`	Faithful translation, natural voice	1.0×	—
`kids`	Rewritten for a 7-year-old, plain language	1.0×	excited
`academic`	Formal register, preserves jargon and honorifics	0.95×	calm
`casual`	Spoken slang, contractions, friendly	1.1×	content
`storyteller`	Vivid, dramatic narration	0.9×	enthusiastic
`news`	Concise, declarative, broadcast-style	1.0×	neutral

Add your own by editing prompts/styles.yaml.

See all available styles: uv run main.py --style list.

💻 CLI usage

# Basic
uv run main.py lecture.mp4 lecture_es.mp4 --language Spanish

# Pick a style
uv run main.py talk.mp4 talk_zh.mp4 --language Chinese --style kids

# Pick a specific voice
uv run main.py lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"

# Skip SRT
uv run main.py lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles

# Full replacement (no original audio underneath)
uv run main.py lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover

# Custom config (e.g. switch to OpenAI/ElevenLabs)
uv run main.py lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml

CLI flags

Flag	Default	Description
`--language` / `-l`	(required)	Target language name (e.g. `Spanish`, `Japanese`)
`--voice` / `-v`	auto	TTS voice. Defaults to the primary native voice for the target language
`--source-language`	`auto-detect`	Source language hint for translation
`--no-subtitles`	off	Skip SRT generation
`--voiceover` / `--no-voiceover`	voiceover on	Keep original audio underneath the dub, or full replacement
`--style` / `-s`	`standard`	Style profile name. Use `--style list` to see all
`--config` / `-c`	`config/default.yaml`	Path to a YAML override file
`--timings-out`	off	Write per-step wall-clock timings + cost as JSON

🛰️ Web app & REST API

uv run run_api.py                              # default dev mode
uv run run_api.py --host 0.0.0.0 --port 8080   # bind everywhere
uv run run_api.py --config config/prod.yaml    # production overrides

Core flow: POST /jobs to start, GET /jobs/{id} to poll, GET /jobs/{id}/video and /srt to download, POST /jobs/{id}/chat for in-video Q&A. Full list with request/response schemas at /docs.

Example

# Submit
JOB=$(curl -s -X POST http://localhost:8000/jobs \
  -F "file=@lecture.mp4" \
  -F "language=Spanish" \
  -F "style=academic" | jq -r .id)

# Poll
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'

# Download
curl -OJ http://localhost:8000/jobs/$JOB/video
curl -OJ http://localhost:8000/jobs/$JOB/srt

Job data lives under jobs/{id}/. Set api.job_ttl_hours to auto-delete jobs older than N hours (default 0 = disabled; config/prod.yaml uses 24h for the public demo).

🌍 Supported languages

Violin supports 33 target languages. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs eleven_v3).

Ordered by native-speaker population.

Language	Cartesia native voice (M / F)	ElevenLabs native voice (M / F)
Chinese	chinese commercial man / chinese female conversational	Lin / Lingyue
Spanish	spanish narrator man / spanish narrator lady	Carlos / Valeria
English	tutorial man / helpful woman	Adam / Sarah
Hindi	hindi narrator man / hindi narrator woman	Yatin / Madhusmita
Arabic	middle eastern woman	Faris / Haneen
Portuguese	friendly brazilian man / pleasant brazilian lady	Medeiros / Luna
Russian	russian narrator man 1 / russian narrator woman	Ivo / Xenia
Japanese	japanese male conversational / japanese woman conversational	Shohei / Maiko
Turkish	turkish narrator man / turkish calm man	Sinan / Aura
German	german reporter man / german conversational woman	Daniel / Sina
Korean	korean narrator man / korean calm woman	Joon-ho / Soo
French	french narrator man / french narrator lady	Lior / Virginie
Italian	italian narrator man / italian narrator woman	Raffaele / Chiara
Polish	polish confident man / polish narrator woman	Gregor / Jola
Dutch	dutch confident man / dutch man	Ronald / Jolanda
Swedish	swedish narrator man / swedish calm lady	Andreas / Louise

The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.

🤝 Contributing

PRs welcome. Got questions or hit a bug? Email heyviolinai@gmail.com or open an issue.

📜 License

MIT — use it freely, including commercially.

🙏 Acknowledgements

Built on top of Together AI, Whisper, Cartesia Sonic 3, ElevenLabs, FastAPI, and ffmpeg.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 14, 2026

0.1.0

May 13, 2026

0.1.0a7 pre-release

May 13, 2026

0.1.0a6 pre-release

May 13, 2026

0.1.0a5 pre-release

May 13, 2026

0.1.0a4 pre-release

May 13, 2026

0.1.0a3 pre-release

May 13, 2026

0.1.0a2 pre-release

May 13, 2026

This version

0.1.0a1 pre-release

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

violin-0.1.0a1.tar.gz (23.8 MB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

violin-0.1.0a1-py3-none-any.whl (97.2 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file violin-0.1.0a1.tar.gz.

File metadata

Download URL: violin-0.1.0a1.tar.gz
Upload date: May 13, 2026
Size: 23.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for violin-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`6b79207717bc11a90e7f7b6e71e820a857b119bc7a60511c44647e75de018eee`
MD5	`31dd7c5c1c0c575018cd340c35d931a2`
BLAKE2b-256	`3375ac69fd53a8de66d1d737a5d5b0dc9d5a04da35980ce29cd4587acb19df49`

See more details on using hashes here.

File details

Details for the file violin-0.1.0a1-py3-none-any.whl.

File metadata

Download URL: violin-0.1.0a1-py3-none-any.whl
Upload date: May 13, 2026
Size: 97.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for violin-0.1.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`100c2d77124edc3173771c37206a40ed903f69a53d2508b5a4d1f8bc20598b88`
MD5	`cecec2843b248d7b144da5218605be27`
BLAKE2b-256	`fcb51dd56bf5144b2e77a5a8597436f759d7a60be1b7492eb6b65da698b12e27`

See more details on using hashes here.

violin 0.1.0a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎻 Violin

✨ Features

🚀 Quick start

Try it without installing anything

Run locally

🎬 How Violin works

⚙️ Configuration

Switch providers

Production overrides

Environment variables

🎭 Style profiles

💻 CLI usage

CLI flags

🛰️ Web app & REST API

Example

🌍 Supported languages

🤝 Contributing

📜 License

🙏 Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes