Generate synchronized SRT subtitles using ElevenLabs Force Alignment API with AI-powered semantic segmentation
Project description
ElevenLabs Force Alignment SRT Generator
🎬 A powerful Python tool for generating synchronized SRT subtitles using ElevenLabs Force Alignment API with optional AI-powered semantic segmentation.
✨ Features
- High-Precision Alignment: Uses ElevenLabs Force Alignment API for accurate word-level timing
- AI Semantic Segmentation: Leverages Google Gemini for intelligent subtitle breaking
- Bilingual Support: Automatically generates bilingual subtitles (original + translation)
- Multi-Language: Supports 99+ languages including Chinese, English, Japanese, Korean, etc.
- Smart Formatting: Removes punctuation and optimizes line breaks for readability
- Flexible Output: Configurable character limits and segmentation strategies
🚀 Quick Start
Prerequisites
- Python 3.7+
- ElevenLabs API key (Get one here)
- Google Gemini API key (Get one here)
Installation
Option 1: Install from PyPI (Recommended)
pip install elevenlabs-srt-generator
Option 2: Install from Source
git clone https://github.com/preangelleo/script-force-alignment.git
cd script-force-alignment
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env and add your API keys
- Run setup validation:
python setup.py
📖 Usage
Command Line Interface
After installing from PyPI, you can use the CLI directly:
# Basic usage
elevenlabs-srt audio.mp3 "Your transcript text" -o output.srt
# With options
elevenlabs-srt audio.mp3 transcript.txt \
--output subtitles.srt \
--max-chars 30 \
--language chinese \
--no-semantic # Disable AI segmentation
Python API
from main import elevenlabs_force_alignment_to_srt
# Generate subtitles
success, result = elevenlabs_force_alignment_to_srt(
audio_file="path/to/audio.mp3",
input_text="Your transcript text here",
output_filepath="output/subtitles.srt",
max_chars_per_line=20,
language='chinese',
use_semantic_segmentation=True, # Enable AI segmentation
model='gemini-2.0-flash' # Optional: specify Gemini model
)
if success:
print(f"Subtitles saved to: {result}")
Using the Example Script
Edit example_usage.py with your parameters:
# Configuration
AUDIO_FILE_PATH = "./samples/your_audio.mp3"
TEXT_CONTENT = "Your transcript here..."
OUTPUT_FILE_PATH = "./output/subtitles.srt"
LANGUAGE = 'chinese'
MAX_CHARS_PER_LINE = 20
USE_SEMANTIC_SEGMENTATION = True
Then run:
python example_usage.py
Running Tests
The test script allows you to compare semantic vs simple segmentation:
python test.py
🔧 API Configuration
Required Environment Variables
Create a .env file with:
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
Getting API Keys
-
ElevenLabs API Key:
- Sign up at ElevenLabs
- Go to your profile settings
- Copy your API key
- Important: Enable the Force Alignment feature in your API settings (it's disabled by default)
-
Google Gemini API Key:
- Visit Google AI Studio
- Create a new API key
- Enable the Gemini API
📝 API Reference
Main Function
elevenlabs_force_alignment_to_srt(
audio_file: str, # Path to audio file
input_text: str, # Transcript text
output_filepath: str, # Output SRT path
api_key: str = None, # Optional API key override
max_chars_per_line: int = 20, # Max characters per line
language: str = 'chinese', # Language code
use_semantic_segmentation: bool = True, # Enable AI segmentation
model: str = None # Gemini model (default: gemini-2.0-flash)
) -> Tuple[bool, str]
Parameters
- audio_file: Path to audio file (MP3, WAV, M4A, OGG, FLAC, etc.)
- input_text: Exact transcript of the audio content
- output_filepath: Where to save the SRT file
- api_key: Optional ElevenLabs API key (overrides .env)
- max_chars_per_line: Maximum characters per subtitle line
- language: Language of the content (e.g., 'chinese', 'english')
- use_semantic_segmentation: Enable AI-powered semantic breaking
- model: Gemini model to use (default: 'gemini-2.0-flash'). Options:
'gemini-2.0-flash': Fast and efficient (default)'gemini-2.0-flash-exp': Experimental features'gemini-1.5-pro': Higher quality output'gemini-2.0-flash-thinking': Complex reasoning
Returns
- Tuple[bool, str]: (Success status, Output path or error message)
🎯 Features Comparison
| Feature | Semantic Segmentation | Simple Segmentation |
|---|---|---|
| Natural breaks | ✅ Yes | ❌ No |
| Bilingual support | ✅ Yes | ❌ No |
| AI-powered | ✅ Yes | ❌ No |
| Processing time | ~3-5s | ~1-2s |
| Quality | High | Basic |
🌍 Supported Languages
The tool supports 99+ languages including:
- Chinese (Simplified & Traditional)
- English
- Japanese
- Korean
- Spanish
- French
- German
- Russian
- Arabic
- Hindi
- And many more...
📊 Output Format
The tool generates standard SRT format:
1
00:00:00,123 --> 00:00:02,456
这是第一行字幕
This is the first subtitle
2
00:00:02,456 --> 00:00:05,789
这是第二行字幕
This is the second subtitle
🔍 Troubleshooting
Common Issues
-
API Key Errors:
- Ensure your API keys are valid
- Check that .env file is in the correct location
- Verify keys don't have extra spaces
-
Audio File Issues:
- Maximum file size: 1GB
- Supported formats: MP3, WAV, M4A, OGG, FLAC, AAC, OPUS, MP4
- Ensure file path is correct
-
Text Alignment Issues:
- Text must match audio content exactly
- Remove extra spaces or formatting
- Check language setting matches audio
Debug Mode
Enable detailed logging by setting environment variable:
export DEBUG=true
python example_usage.py
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- ElevenLabs for the Force Alignment API
- Google Gemini for AI semantic analysis
- Community contributors
📧 Support
For issues, questions, or suggestions:
- Open an issue on GitHub
- Contact: your-email@example.com
🚦 Project Status
Made with ❤️ for the subtitle generation community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elevenlabs_srt_generator-1.0.2.tar.gz.
File metadata
- Download URL: elevenlabs_srt_generator-1.0.2.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76ad8ce54e2ff2ca906a72d4b16ffde0737844f2da48acb845720abe529705aa
|
|
| MD5 |
c0a1e73eef05d32954e388568a58a14f
|
|
| BLAKE2b-256 |
4e9370d50f8c19e22026c31f35a3e9037b99e2bdb118c7523bb94ae8156497c7
|
File details
Details for the file elevenlabs_srt_generator-1.0.2-py3-none-any.whl.
File metadata
- Download URL: elevenlabs_srt_generator-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a7181ba88f824b1db221c779043b97b8a8e0ac980d40d286124a87c9c0c5c4e
|
|
| MD5 |
5946c1228c22cff3457e2a1f74836a1a
|
|
| BLAKE2b-256 |
175589a91127250ea3796f452309fad11c6e44809252e8779619d1ce501f0af8
|