Skip to main content

Media processing toolkit for presentation localization

Project description

Montaigne

PyPI Tests License

A Python toolkit for presentation animation. Extract slides, translate visuals, generate voiceovers, and create videos—powered by Google Gemini AI, ElevenLabs, and local TTS.

Features

  • PDF Extraction: Convert PDF presentations to high-quality images. Configurable DPI settings (150-300+) with PNG or JPG output formats.
  • Script Generation: Generate professional voiceover scripts using a two-pass AI approach. Holistic context analysis, narrative arc awareness, and production notes with pronunciation guides.
  • Image Translation: Translate text in images to any target language. Context-aware translations powered by Gemini.
  • Voice Synthesis: Generate natural voiceover audio from scripts. Three providers: Gemini TTS (cloud), ElevenLabs (premium voices), or Coqui XTTS-v2 (local, no API key required).
  • Video Generation: Combine translated slides and voiceover audio into polished videos. Configurable resolution up to 1920x1080.
  • PowerPoint Export: Create PPTX presentations from PDF or images. Optionally add voiceover scripts as speaker notes for each slide.
  • Cloud Deployment: Offload video generation to Google Cloud Run. Upload PDFs, process in the cloud, and download results with secure signed URLs.
  • Model Configuration: Customize Gemini models for each operation. Use --model flags to switch between flash and pro models based on your needs.
  • Video Annotation: Frame-accurate video and audio annotation tool. Add timestamps, export to WebVTT/SRT formats for captions. Waveform visualization with click-to-seek.
  • Web Editor: Streamlit-based slide editor for managing presentations

Installation

Using pip

pip install montaigne

With optional dependencies

# Install with web editor support
pip install "montaigne[edit]"

# Install with annotation tool support
pip install "montaigne[annotate]"

# Install all optional dependencies
pip install "montaigne[all]"

Using uv

uv pip install montaigne

Using uvx (no installation required)

uvx --from montaigne essai setup
uvx --from montaigne essai script --input presentation.pdf

Setup

  1. Get a Gemini API key from Google AI Studio
  2. Create a .env file:
    GEMINI_API_KEY=your-api-key
    
  3. Verify setup:
    essai setup
    

Usage

Extract PDF to Images

essai pdf presentation.pdf
essai pdf presentation.pdf --dpi 200 --format jpg

Generate Voiceover Script from Slides

essai script --input presentation.pdf
essai script --input slides_images/ --context "AI workshop"
essai script --input presentation.pdf --output custom_script.md
essai script --input presentation.pdf --model gemini-2.5-flash

Options:

  • --input, -i: PDF file or folder of slide images
  • --output, -o: Output markdown file path
  • --context, -c: Additional context to guide script generation
  • --model, -m: Gemini model to use (default: gemini-3-pro-preview)

Generate Audio from Script

essai audio --script voiceover.md
essai audio --script voiceover.md --voice Kore
essai audio --script voiceover.md --model gemini-2.5-flash-preview-tts

TTS Providers:

Provider Description Installation
gemini Google Gemini TTS API (default) Included
elevenlabs ElevenLabs TTS API Included
coqui Local Coqui XTTS-v2 (no API key) pip install "montaigne[coqui]"

Gemini voices: Puck, Charon, Kore, Fenrir, Aoede, Orus

Local TTS with Coqui:

# Install Coqui dependencies
pip install "montaigne[coqui]"

# Generate audio locally (no API key required)
essai audio --script voiceover.md --provider coqui
essai audio --script voiceover.md --provider coqui --voice male
essai audio --list-voices --provider coqui

Coqui voices: female, male, neutral

Note: First run downloads the XTTS-v2 model (~1.5GB). Requires accepting the CPML license.

Options:

  • --script, -s: Path to voiceover markdown script
  • --provider, -p: TTS provider (gemini, elevenlabs, coqui)
  • --voice, -v: TTS voice to use (default: Orus for Gemini, female for Coqui)
  • --model, -m: Gemini TTS model (default: gemini-2.5-pro-preview-tts)

Translate Images

essai translate --input slides/
essai translate --input image.png --lang Spanish
essai translate --input slides/ --model gemini-2.0-flash-exp

Options:

  • --input, -i: Image file or folder of images
  • --lang, -l: Target language (default: French)
  • --model, -m: Gemini model (default: gemini-3-pro-image-preview)

Create PowerPoint from PDF or Images

essai ppt --input presentation.pdf
essai ppt --input slides/ --script voiceover.md
essai ppt --input presentation.pdf --keep-images

This will create a .pptx file with each PDF page or image as a slide. If a voiceover script is provided, it will be added as speaker notes.

Generate Video from Slides

essai video --pdf presentation.pdf
essai video --images slides/ --audio audio/

Full Localization Pipeline

essai localize --pdf presentation.pdf --script voiceover.md --lang French

This will:

  1. Extract PDF pages to images
  2. Translate all images to the target language
  3. Generate audio for all slides

Video/Audio Annotation Tool

Launch an interactive web UI for annotating videos or audio files with frame-accurate timestamps:

# Install annotation dependencies first
pip install "montaigne[annotate]"

# Launch annotation UI
essai annotate video.mp4
essai annotate audio.wav
essai annotate                        # Auto-detect media in current dir
essai annotate video.mp4 --network    # Make accessible on local network

# Export annotations
essai annotate video.mp4 --export srt   # Export to SRT (Premiere, DaVinci)
essai annotate video.mp4 --export vtt   # Export to WebVTT (browsers)
essai annotate video.mp4 --export json  # Export to JSON

Keyboard shortcuts:

Key Action
Space Play/Pause
I Set In point for range
O Set Out point for range
[ ] Step frame backward/forward
Ctrl+Enter Submit annotation
Escape Clear range / exit input

Features:

  • Frame-accurate timing using requestVideoFrameCallback API
  • Waveform visualization with click-to-seek
  • Light/dark theme toggle
  • Local-first SQLite storage (zero-latency)
  • Export to WebVTT, SRT, JSON formats

Web Editor

Launch a Streamlit-based web interface for managing slides and scripts:

# Install editor dependencies first
pip install "montaigne[edit]"

# Launch the editor
essai edit
essai edit --pdf presentation.pdf --script voiceover.md

Model Configuration

Each AI command supports a --model / -m flag to override the default Gemini model:

Command Default Model Purpose
essai script gemini-3-pro-preview Script generation
essai audio gemini-2.5-pro-preview-tts Text-to-speech
essai translate gemini-3-pro-image-preview Image translation

List available models:

essai models

Voiceover Script Format

Scripts should follow this markdown format:

## SLIDE 1: Title
**[Duration: ~45 seconds]**

Your narration text for slide 1 goes here.

---

## SLIDE 2: Next Topic
**[Duration: ~60 seconds]**

Narration for slide 2.

Demo

See the demo/hamlet/ folder for a complete example with:

  • Sample PDF presentation
  • Voiceover script
  • Image asset
cd demo/hamlet
essai localize --lang French

Requirements

  • Python 3.10+
  • Google Gemini API key
  • ffmpeg (for video generation)
  • Dependencies: google-genai, python-dotenv, pymupdf, python-pptx, Pillow

Optional Dependencies

  • edit: streamlit - Web editor interface
  • annotate: flask - Video/audio annotation tool
  • coqui: TTS, torch, torchaudio - Local TTS with Coqui XTTS-v2 (no API key required)
  • cloud: fastapi, uvicorn, google-cloud-storage - Cloud API deployment

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

montaigne-1.4.0.tar.gz (99.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

montaigne-1.4.0-py3-none-any.whl (93.2 kB view details)

Uploaded Python 3

File details

Details for the file montaigne-1.4.0.tar.gz.

File metadata

  • Download URL: montaigne-1.4.0.tar.gz
  • Upload date:
  • Size: 99.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for montaigne-1.4.0.tar.gz
Algorithm Hash digest
SHA256 e5034b01e4822aa00a1a476c12ccde4cd197dd8f65f51cc30f03cb236847bbde
MD5 e8b978bce1f1bf30b6dc0cff9fb0f727
BLAKE2b-256 44aa8cf54fdfd890a508ee291de500e10f689ac63148b5c6f966f968e0132bc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for montaigne-1.4.0.tar.gz:

Publisher: python-publish.yml on yanndebray/montaigne

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file montaigne-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: montaigne-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 93.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for montaigne-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 139af908ff77cef4de496a02c115bd3e4ea2400a1542f9c3cb106c82b6001bd0
MD5 37615bcee19837080cf95a3aa32a8313
BLAKE2b-256 b7683c84336e4d9ff4bc17cf01418ca7f109b76eb85bc880ca3452c5a4f17355

See more details on using hashes here.

Provenance

The following attestation bundles were made for montaigne-1.4.0-py3-none-any.whl:

Publisher: python-publish.yml on yanndebray/montaigne

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page