Media processing toolkit for presentation localization
Project description
Montaigne
A Python toolkit for presentation animation. Extract slides, translate visuals, generate voiceovers, and create videos—powered by Google Gemini AI, ElevenLabs, and local TTS.
Features
- PDF Extraction: Convert PDF presentations to high-quality images. Configurable DPI settings (150-300+) with PNG or JPG output formats.
- Script Generation: Generate professional voiceover scripts using a two-pass AI approach. Holistic context analysis, narrative arc awareness, and production notes with pronunciation guides.
- Image Translation: Translate text in images to any target language. Context-aware translations powered by Gemini.
- Voice Synthesis: Generate natural voiceover audio from scripts. Three providers: Gemini TTS (cloud), ElevenLabs (premium voices), or Coqui XTTS-v2 (local, no API key required).
- Video Generation: Combine translated slides and voiceover audio into polished videos. Configurable resolution up to 1920x1080.
- PowerPoint Export: Create PPTX presentations from PDF or images. Optionally add voiceover scripts as speaker notes for each slide.
- Cloud Deployment: Offload video generation to Google Cloud Run. Upload PDFs, process in the cloud, and download results with secure signed URLs.
- Model Configuration: Customize Gemini models for each operation. Use
--modelflags to switch between flash and pro models based on your needs. - Video Annotation: Frame-accurate video and audio annotation tool. Add timestamps, export to WebVTT/SRT formats for captions. Waveform visualization with click-to-seek.
- Web Editor: Streamlit-based slide editor for managing presentations
Installation
Using pip
pip install montaigne
With optional dependencies
# Install with web editor support
pip install "montaigne[edit]"
# Install with annotation tool support
pip install "montaigne[annotate]"
# Install all optional dependencies
pip install "montaigne[all]"
Using uv
uv pip install montaigne
Using uvx (no installation required)
uvx --from montaigne essai setup
uvx --from montaigne essai script --input presentation.pdf
Setup
- Get a Gemini API key from Google AI Studio
- Create a
.envfile:GEMINI_API_KEY=your-api-key - Verify setup:
essai setup
Usage
Extract PDF to Images
essai pdf presentation.pdf
essai pdf presentation.pdf --dpi 200 --format jpg
Generate Voiceover Script from Slides
essai script --input presentation.pdf
essai script --input slides_images/ --context "AI workshop"
essai script --input presentation.pdf --output custom_script.md
essai script --input presentation.pdf --model gemini-2.5-flash
Options:
--input, -i: PDF file or folder of slide images--output, -o: Output markdown file path--context, -c: Additional context to guide script generation--model, -m: Gemini model to use (default:gemini-3-pro-preview)
Generate Audio from Script
essai audio --script voiceover.md
essai audio --script voiceover.md --voice Kore
essai audio --script voiceover.md --model gemini-2.5-flash-preview-tts
TTS Providers:
| Provider | Description | Installation |
|---|---|---|
gemini |
Google Gemini TTS API (default) | Included |
elevenlabs |
ElevenLabs TTS API | Included |
coqui |
Local Coqui XTTS-v2 (no API key) | pip install "montaigne[coqui]" |
Gemini voices: Puck, Charon, Kore, Fenrir, Aoede, Orus
Local TTS with Coqui:
# Install Coqui dependencies
pip install "montaigne[coqui]"
# Generate audio locally (no API key required)
essai audio --script voiceover.md --provider coqui
essai audio --script voiceover.md --provider coqui --voice male
essai audio --list-voices --provider coqui
Coqui voices: female, male, neutral
Note: First run downloads the XTTS-v2 model (~1.5GB). Requires accepting the CPML license.
Options:
--script, -s: Path to voiceover markdown script--provider, -p: TTS provider (gemini,elevenlabs,coqui)--voice, -v: TTS voice to use (default:Orusfor Gemini,femalefor Coqui)--model, -m: Gemini TTS model (default:gemini-2.5-pro-preview-tts)
Translate Images
essai translate --input slides/
essai translate --input image.png --lang Spanish
essai translate --input slides/ --model gemini-2.0-flash-exp
Options:
--input, -i: Image file or folder of images--lang, -l: Target language (default:French)--model, -m: Gemini model (default:gemini-3-pro-image-preview)
Create PowerPoint from PDF or Images
essai ppt --input presentation.pdf
essai ppt --input slides/ --script voiceover.md
essai ppt --input presentation.pdf --keep-images
This will create a .pptx file with each PDF page or image as a slide. If a voiceover script is provided, it will be added as speaker notes.
Generate Video from Slides
essai video --pdf presentation.pdf
essai video --images slides/ --audio audio/
Full Localization Pipeline
essai localize --pdf presentation.pdf --script voiceover.md --lang French
This will:
- Extract PDF pages to images
- Translate all images to the target language
- Generate audio for all slides
Video/Audio Annotation Tool
Launch an interactive web UI for annotating videos or audio files with frame-accurate timestamps:
# Install annotation dependencies first
pip install "montaigne[annotate]"
# Launch annotation UI
essai annotate video.mp4
essai annotate audio.wav
essai annotate # Auto-detect media in current dir
essai annotate video.mp4 --network # Make accessible on local network
# Export annotations
essai annotate video.mp4 --export srt # Export to SRT (Premiere, DaVinci)
essai annotate video.mp4 --export vtt # Export to WebVTT (browsers)
essai annotate video.mp4 --export json # Export to JSON
Keyboard shortcuts:
| Key | Action |
|---|---|
| Space | Play/Pause |
| I | Set In point for range |
| O | Set Out point for range |
| [ ] | Step frame backward/forward |
| Ctrl+Enter | Submit annotation |
| Escape | Clear range / exit input |
Features:
- Frame-accurate timing using
requestVideoFrameCallbackAPI - Waveform visualization with click-to-seek
- Light/dark theme toggle
- Local-first SQLite storage (zero-latency)
- Export to WebVTT, SRT, JSON formats
Web Editor
Launch a Streamlit-based web interface for managing slides and scripts:
# Install editor dependencies first
pip install "montaigne[edit]"
# Launch the editor
essai edit
essai edit --pdf presentation.pdf --script voiceover.md
Model Configuration
Each AI command supports a --model / -m flag to override the default Gemini model:
| Command | Default Model | Purpose |
|---|---|---|
essai script |
gemini-3-pro-preview |
Script generation |
essai audio |
gemini-2.5-pro-preview-tts |
Text-to-speech |
essai translate |
gemini-3-pro-image-preview |
Image translation |
List available models:
essai models
Voiceover Script Format
Scripts should follow this markdown format:
## SLIDE 1: Title
**[Duration: ~45 seconds]**
Your narration text for slide 1 goes here.
---
## SLIDE 2: Next Topic
**[Duration: ~60 seconds]**
Narration for slide 2.
Demo
See the demo/hamlet/ folder for a complete example with:
- Sample PDF presentation
- Voiceover script
- Image asset
cd demo/hamlet
essai localize --lang French
Requirements
- Python 3.10+
- Google Gemini API key
- ffmpeg (for video generation)
- Dependencies:
google-genai,python-dotenv,pymupdf,python-pptx,Pillow
Optional Dependencies
- edit:
streamlit- Web editor interface - annotate:
flask- Video/audio annotation tool - coqui:
TTS,torch,torchaudio- Local TTS with Coqui XTTS-v2 (no API key required) - cloud:
fastapi,uvicorn,google-cloud-storage- Cloud API deployment
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file montaigne-1.4.0.tar.gz.
File metadata
- Download URL: montaigne-1.4.0.tar.gz
- Upload date:
- Size: 99.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5034b01e4822aa00a1a476c12ccde4cd197dd8f65f51cc30f03cb236847bbde
|
|
| MD5 |
e8b978bce1f1bf30b6dc0cff9fb0f727
|
|
| BLAKE2b-256 |
44aa8cf54fdfd890a508ee291de500e10f689ac63148b5c6f966f968e0132bc5
|
Provenance
The following attestation bundles were made for montaigne-1.4.0.tar.gz:
Publisher:
python-publish.yml on yanndebray/montaigne
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
montaigne-1.4.0.tar.gz -
Subject digest:
e5034b01e4822aa00a1a476c12ccde4cd197dd8f65f51cc30f03cb236847bbde - Sigstore transparency entry: 1011014908
- Sigstore integration time:
-
Permalink:
yanndebray/montaigne@00ff1bde90a0c15d5ca325f6d436b4d138f748ee -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/yanndebray
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@00ff1bde90a0c15d5ca325f6d436b4d138f748ee -
Trigger Event:
release
-
Statement type:
File details
Details for the file montaigne-1.4.0-py3-none-any.whl.
File metadata
- Download URL: montaigne-1.4.0-py3-none-any.whl
- Upload date:
- Size: 93.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
139af908ff77cef4de496a02c115bd3e4ea2400a1542f9c3cb106c82b6001bd0
|
|
| MD5 |
37615bcee19837080cf95a3aa32a8313
|
|
| BLAKE2b-256 |
b7683c84336e4d9ff4bc17cf01418ca7f109b76eb85bc880ca3452c5a4f17355
|
Provenance
The following attestation bundles were made for montaigne-1.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on yanndebray/montaigne
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
montaigne-1.4.0-py3-none-any.whl -
Subject digest:
139af908ff77cef4de496a02c115bd3e4ea2400a1542f9c3cb106c82b6001bd0 - Sigstore transparency entry: 1011014965
- Sigstore integration time:
-
Permalink:
yanndebray/montaigne@00ff1bde90a0c15d5ca325f6d436b4d138f748ee -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/yanndebray
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@00ff1bde90a0c15d5ca325f6d436b4d138f748ee -
Trigger Event:
release
-
Statement type: