Skip to main content

Generate audio from SRT files using Microsoft Edge's text-to-speech service

Project description

cakesrt2audio

🎵 Generate synchronized audio from SRT subtitle files using Microsoft Edge's text-to-speech service.

This tool converts SRT subtitle files into audio files with perfect timing synchronization. It can also overlay the generated audio onto existing video files.

✨ Features

  • 🎯 Perfect Timing: Synchronizes audio with SRT timestamps
  • 🎭 Multiple Voices: Supports 100+ voices in various languages
  • Concurrent Processing: Fast generation with configurable concurrency
  • 🎬 Video Support: Can overlay audio onto existing videos
  • 📊 Rich Progress Display: Beautiful progress bars and voice listings
  • 🔧 Flexible Output: Supports both audio (MP3) and video (MP4) output

🚀 Installation

pip install cakesrt2audio

📖 Usage

🎵 Generate Audio from SRT

Basic usage (generates output.mp3):

cakesrt2audio your_subtitles.srt

Custom voice and output file:

cakesrt2audio your_subtitles.srt --voice zh-CN-XiaoxiaoNeural --output my_audio.mp3

🎬 Generate Video with Audio Overlay

Add audio to existing video:

cakesrt2audio your_subtitles.srt --video your_video.mp4 --output final_video.mp4

⚡ Advanced Options

cakesrt2audio your_subtitles.srt \
  --voice en-US-AvaMultilingualNeural \
  --output output.mp3 \
  --concurrency 20

📋 Command Line Parameters

Parameter Description Default
srt_file Path to the SRT subtitle file Required
--voice Voice ID for speech synthesis en-US-AvaMultilingualNeural
--output Output file path output.mp3 or output.mp4
--video Source video file (optional) None
--concurrency Number of concurrent TTS requests 10

🎭 Available Voices

To see all available Chinese and English voices with descriptions:

cakesrt2audio --help

Popular voice options:

  • en-US-AvaMultilingualNeural - English (US), Female
  • en-US-BrianMultilingualNeural - English (US), Male
  • zh-CN-XiaoxiaoNeural - Chinese (Mainland), Female
  • zh-CN-YunyangNeural - Chinese (Mainland), Male
  • en-GB-SoniaNeural - English (UK), Female

🐍 Python API Usage

Basic Audio Generation

import asyncio
from cakesrt2audio import create_audio_from_srt

# Generate audio file
asyncio.run(create_audio_from_srt(
    srt_file="subtitles.srt",
    voice="en-US-AvaMultilingualNeural", 
    output_file="output.mp3"
))

Generate Video with Audio

import asyncio
from cakesrt2audio import create_audio_from_srt

# Generate video with audio overlay
asyncio.run(create_audio_from_srt(
    srt_file="subtitles.srt",
    voice="zh-CN-XiaoxiaoNeural",
    output_file="final_video.mp4",
    video_path="source_video.mp4",
    concurrency=15
))

📄 SRT File Format

Your SRT file should follow the standard format:

1
00:00:01,000 --> 00:00:03,500
Welcome to our presentation

2
00:00:04,000 --> 00:00:07,200
This is the second subtitle

3
00:00:08,000 --> 00:00:10,500
And this is the third one

🔧 Requirements

  • Python 3.8+
  • FFmpeg (for video processing)
  • Internet connection (for Microsoft Edge TTS)

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from FFmpeg official website

🎯 Use Cases

  • 📚 Educational Content: Convert lecture notes to audio
  • 🎬 Video Production: Add voiceovers to silent videos
  • 🌐 Accessibility: Create audio versions of text content
  • 🎧 Podcast Creation: Generate spoken content from scripts
  • 🎮 Game Development: Create character dialogue audio

⚠️ Notes

  • Requires active internet connection for TTS generation
  • Large SRT files may take time to process
  • Adjust --concurrency based on your internet speed
  • Output timing matches SRT timestamps precisely

🐛 Troubleshooting

Common issues:

  1. FFmpeg not found: Install FFmpeg and ensure it's in your PATH
  2. TTS fails: Check internet connection and try reducing concurrency
  3. Audio sync issues: Verify your SRT file format is correct

📄 License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cakesrt2audio-0.1.2.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cakesrt2audio-0.1.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file cakesrt2audio-0.1.2.tar.gz.

File metadata

  • Download URL: cakesrt2audio-0.1.2.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cakesrt2audio-0.1.2.tar.gz
Algorithm Hash digest
SHA256 22eaba7689f93a4210563170a8a1251437a085ab613d3acb358728fd4966e343
MD5 5f365b6fbe37ffdd8e0bd6d4d775b5e2
BLAKE2b-256 59079af674f5f73e06d0d625adb3ad9f853da31bfe8a7a64ef8d286539313363

See more details on using hashes here.

File details

Details for the file cakesrt2audio-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cakesrt2audio-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cakesrt2audio-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b8a0d839381ebbe9606c98bab4902484d82aceb840afb90e81ff74b40ca90503
MD5 59fd81c4d25b693b34060f946600e3cb
BLAKE2b-256 282837574553774aa651e8781b8e22d77958bc95ff863fcf93c4208e0c62733b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page