Skip to main content

Open-source video dubbing โ€” translate any video into 33 languages with native-sounding voice-over and synced subtitles.

Project description

๐ŸŽป Violin

Open-source video dubbing โ€” translate any video into 33 languages with natural-sounding voice-over and synced subtitles.

๐ŸŒ Live demo ยท ๐Ÿ“œ MIT License

Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video โ€” fully aligned, with optional SRT subtitles.

Available as a CLI, a FastAPI web app, and a Claude Code skill.


โœจ Features

  • 33 target languages with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
  • In-video Q&A โ€” ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
  • Natural-language voice picker โ€” describe the voice you want, an LLM picks from the catalog
  • 6 style profiles (experimental) โ€” standard / kids / academic / casual / storyteller / news
  • Pluggable stack โ€” Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML

๐Ÿš€ Quick start

Try it without installing anything

The live demo runs at https://violin-ai.com โ€” drop a short clip in, get a dubbed video out in a few minutes.

Run locally

Requires Python 3.13+ and ffmpeg on PATH.

uv tool install --pre violin     # --pre needed while v0.1 is in alpha
export TOGETHER_API_KEY=...      # get one at https://api.together.ai (add to ~/.zshrc to persist)

Three ways to use it:

1. CLI โ€” translate one file:

violin lecture.mp4 lecture_zh.mp4 --language Chinese

2. Web app โ€” full REST API + browser UI:

violin-api
# โ†’ http://127.0.0.1:8000           (browser UI)
# โ†’ http://127.0.0.1:8000/docs      (interactive API docs)

3. Claude Code skill โ€” invoke from any Claude Code session:

violin --install-skill          # one-time: copies the skill into ~/.claude/skills/
claude
> please use the violin skill to translate path/to/video.mp4 into Chinese
Run from source (for hacking on the pipeline)
git clone https://github.com/shang-zhu/violin.git
cd violin
uv sync
cp .env.example .env             # then fill in TOGETHER_API_KEY
uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese

๐ŸŽฌ How Violin works

Video
  โ”‚
  โ”œโ”€ ffmpeg โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Extract audio (16 kHz WAV)
  โ”‚
  โ”œโ”€ Whisper Large v3 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Word-level timestamps โ†’ sentence segments
  โ”‚
  โ”œโ”€ LLM (DeepSeek V4 Pro by default) โ”€โ”€โ–บ Translate each segment, respecting style profile
  โ”‚
  โ”œโ”€ TTS (Cartesia Sonic 3 by default) โ”€โ–บ Synthesize dubbed audio per segment
  โ”‚
  โ””โ”€ ffmpeg โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Speed-align video to dubbed audio,
                                    concat with freeze-frame fallback,
                                    single-pass AAC encode the audio track,
                                    write output mp4 + optional SRT

Key engineering decisions worth a look if you're forking:

  • pipeline/transcriber.py โ€” uses Whisper's word-level timestamps to split into precise sentence boundaries. Has a hallucination filter that re-uses Whisper's own no_speech_prob segment metadata (no hand-tuned heuristics).
  • pipeline/merger.py โ€” concatenates speed-adjusted video chunks, but builds the audio track once from concatenated PCM and encodes AAC at the end. This is the difference between "subtitles drift 1โ€“2 s by the end of an 8 min video" and "perfectly synced throughout."
  • pipeline/tts_*.py โ€” Cartesia + ElevenLabs backends share an interface. ElevenLabs side ships 21 premade voices (multilingual via eleven_v3) plus 15 hand-picked native-speaker voices from the Voice Library.

โš™๏ธ Configuration

All defaults live in config/default.yaml. Override with --config my.yaml (only the keys you want to change need to appear in the override file โ€” values deep-merge).

Switch providers

# config/default.yaml โ€” pick the stack you want
models:
  transcription:
    provider: together                  # together | openai
    model: openai/whisper-large-v3      # together โ†’ openai/whisper-large-v3 | openai โ†’ whisper-1
  translation:
    provider: together                  # together | openai
    model: deepseek-ai/DeepSeek-V4-Pro  # together โ†’ deepseek-ai/DeepSeek-V4-Pro | openai โ†’ gpt-5.5
  tts:
    provider: together                  # together | elevenlabs | openai
    model: cartesia/sonic-3             # together โ†’ cartesia/sonic-3 | elevenlabs โ†’ eleven_v3 | openai โ†’ tts-1-hd

Production overrides

A starter config/prod.yaml is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included Dockerfile + docker-compose.yml + Caddyfile are how the live demo is hosted โ€” docker compose up -d --build after filling .env is enough to put a copy of Violin behind auto-HTTPS on any Docker host.

Environment variables

Variable When required Description
TOGETHER_API_KEY Recommended โ€” covers every stage with the default config Together AI API key
OPENAI_API_KEY Any stage uses provider: openai Covers whisper-1, GPT models, and tts-1
ELEVENLABS_API_KEY TTS uses provider: elevenlabs ElevenLabs API key
CORS_ORIGINS Optional Comma-separated allowed origins (default: *)

You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on openai) work too โ€” OPENAI_API_KEY alone is enough. Same idea for ElevenLabs.


๐ŸŽญ Style profiles

Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use --style <name> on the CLI or pass style in API requests.

Style Tone TTS speed Emotion
standard Faithful translation, natural voice 1.0ร— โ€”
kids Rewritten for a 7-year-old, plain language 1.0ร— excited
academic Formal register, preserves jargon and honorifics 0.95ร— calm
casual Spoken slang, contractions, friendly 1.1ร— content
storyteller Vivid, dramatic narration 0.9ร— enthusiastic
news Concise, declarative, broadcast-style 1.0ร— neutral

Add your own by editing prompts/styles.yaml.

See all available styles: violin --style list.


๐Ÿ’ป CLI usage

Examples use the PyPI-installed violin command. If you're running from a git checkout, substitute uv run main.py for violin (and uv run run_api.py for violin-api).

# Basic
violin lecture.mp4 lecture_es.mp4 --language Spanish

# Pick a style
violin talk.mp4 talk_zh.mp4 --language Chinese --style kids

# Pick a specific voice
violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"

# Skip SRT
violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles

# Full replacement (no original audio underneath)
violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover

# Custom config (e.g. switch to OpenAI/ElevenLabs)
violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml

CLI flags

Flag Default Description
--language / -l (required) Target language name (e.g. Spanish, Japanese)
--voice / -v auto TTS voice. Defaults to the primary native voice for the target language
--source-language auto-detect Source language hint for translation
--no-subtitles off Skip SRT generation
--voiceover / --no-voiceover voiceover on Keep original audio underneath the dub, or full replacement
--style / -s standard Style profile name. Use --style list to see all
--config / -c config/default.yaml Path to a YAML override file
--timings-out off Write per-step wall-clock timings + cost as JSON

๐Ÿ›ฐ๏ธ Web app & REST API

violin-api                              # default dev mode
violin-api --host 0.0.0.0 --port 8080   # bind everywhere
violin-api --config config/prod.yaml    # production overrides (requires a git checkout for config/prod.yaml)

Core flow: POST /jobs to start, GET /jobs/{id} to poll, GET /jobs/{id}/video and /srt to download, POST /jobs/{id}/chat for in-video Q&A. Full list with request/response schemas at /docs.

Example

# Submit
JOB=$(curl -s -X POST http://localhost:8000/jobs \
  -F "file=@lecture.mp4" \
  -F "language=Spanish" \
  -F "style=academic" | jq -r .id)

# Poll
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'

# Download
curl -OJ http://localhost:8000/jobs/$JOB/video
curl -OJ http://localhost:8000/jobs/$JOB/srt

Job data lives under jobs/{id}/. Set api.job_ttl_hours to auto-delete jobs older than N hours (default 0 = disabled; config/prod.yaml uses 24h for the public demo).


๐ŸŒ Supported languages

Violin supports 33 target languages. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs eleven_v3).

Ordered by native-speaker population.

Language Cartesia native voice (M / F) ElevenLabs native voice (M / F)
Chinese chinese commercial man / chinese female conversational Lin / Lingyue
Spanish spanish narrator man / spanish narrator lady Carlos / Valeria
English tutorial man / helpful woman Adam / Sarah
Hindi hindi narrator man / hindi narrator woman Yatin / Madhusmita
Arabic middle eastern woman Faris / Haneen
Portuguese friendly brazilian man / pleasant brazilian lady Medeiros / Luna
Russian russian narrator man 1 / russian narrator woman Ivo / Xenia
Japanese japanese male conversational / japanese woman conversational Shohei / Maiko
Turkish turkish narrator man / turkish calm man Sinan / Aura
German german reporter man / german conversational woman Daniel / Sina
Korean korean narrator man / korean calm woman Joon-ho / Soo
French french narrator man / french narrator lady Lior / Virginie
Italian italian narrator man / italian narrator woman Raffaele / Chiara
Polish polish confident man / polish narrator woman Gregor / Jola
Dutch dutch confident man / dutch man Ronald / Jolanda
Swedish swedish narrator man / swedish calm lady Andreas / Louise

The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.


๐Ÿค Contributing

PRs welcome. Got questions or hit a bug? Email heyviolinai@gmail.com or open an issue.


๐Ÿ“œ License

MIT โ€” use it freely, including commercially.


๐Ÿ™ Acknowledgements

Built on top of Together AI, Whisper, Cartesia Sonic 3, ElevenLabs, FastAPI, and ffmpeg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

violin-0.1.0a2.tar.gz (23.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

violin-0.1.0a2-py3-none-any.whl (100.1 kB view details)

Uploaded Python 3

File details

Details for the file violin-0.1.0a2.tar.gz.

File metadata

  • Download URL: violin-0.1.0a2.tar.gz
  • Upload date:
  • Size: 23.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for violin-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 6a129a4df47443aac02e4c5594690b3544bb17363b354c4ac34eabbe8ee9dafa
MD5 0267019a19d07acaad465bd5635c48bb
BLAKE2b-256 28af8ab543d8dfc77556c8e24d77bead70f7078a4a4028ce61eaa1c8db70543d

See more details on using hashes here.

File details

Details for the file violin-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: violin-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 100.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for violin-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 08e6d980310d84cb5547d26790f3289af97f8735eabd910d52b2e071fd29cb7f
MD5 9d0859269c0a921d10f5afa34f278a2e
BLAKE2b-256 879fa5ecd16d6ff526c787bed099f787114463082a9a0ef0857e588501a24224

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page