Reflect. Narrate.
Present.
A Python toolkit for presentation animation. Extract slides, translate visuals, generate voiceovers, and create videos—powered by Google Gemini AI, ElevenLabs, and local TTS.
Features
Everything You Need
A complete toolkit for animating presentations and generating multimedia content with AI.
PDF & PPTX Input
Accept PDF or PPTX presentations as input. Extract high-quality slide images with configurable DPI. PPTX slide notes are auto-used as voiceover scripts.
Script Generation
Generate professional voiceover scripts using a two-pass AI approach. Holistic context analysis, narrative arc awareness, and production notes with pronunciation guides.
Image Translation
Translate text in images to any target language. Powered by Nano Banana Pro 🍌 for accurate, context-aware translations.
Voice Synthesis
Generate natural voiceover audio from scripts. Three providers: Gemini TTS (cloud), ElevenLabs (premium voices), or Coqui XTTS-v2 (local, no API key required).
Video Generation
Combine translated slides and voiceover audio into polished videos. Configurable resolution up to 1920x1080.
PowerPoint I/O
Import PPTX as input (extract slides and notes) or export to PPTX from PDF/images. Voiceover scripts can be added as speaker notes.
Cloud Deployment
Offload video generation to Google Cloud Run. Upload PDFs, process in the cloud, and download results with secure signed URLs.
Model Configuration
Customize Gemini models for each operation. Use --model flags to switch between flash and pro models based on your needs.
Video Annotation
Frame-accurate video and audio annotation tool. Add timestamps, export to WebVTT/SRT formats for captions. Waveform visualization with click-to-seek. Requires pip install montaigne[annotate].
Quick Start
Simple by Design
Get started in seconds. Set up your API key and start localizing presentations.
# Install the package $ pip install montaigne # Create .env file with your API keys $ echo "GEMINI_API_KEY=your-gemini-key" > .env $ echo "ELEVENLABS_API_KEY=your-elevenlabs-key" >> .env # Optional # Verify everything is set up correctly $ essai setup API key configured successfully
## Slide 1: Introduction **Duration:** 45-60 seconds **Tone:** Inviting, setting the stage ### Voice-Over: Welcome to this presentation on *artificial intelligence* and its transformative applications in modern business. --- ## Slide 2: Key Benefits **Duration:** 60-75 seconds **Tone:** Energetic, solution-oriented ### Voice-Over: Now that we've set the stage, let's explore the three main benefits: automation, insights, and scalability. --- ## Production Notes ### Pronunciation Guide - `AI`: "A-I" - `API`: "A-P-I"
CLI Reference
The essai Command
A powerful command-line interface for all your localization needs.
--dpi 200 for resolution, --format jpg for output format.
--input, --output, --context, --model (default: gemini-3-pro-preview).
--script, --voice, --provider (gemini, elevenlabs, or coqui), --model (default: gemini-2.5-pro-preview-tts), --list-voices. Coqui runs locally without API key: pip install montaigne[coqui].
--input, --lang (default: French), --model (default: gemini-3-pro-image-preview).
--pdf (accepts PDF or PPTX), --script, --lang, --provider.
--pdf (accepts PDF or PPTX), --context, --provider, --script-model, --audio-model.
--input for PDF, PPTX, or folder, --script to add speaker notes, --keep-images.
--filter / -f to filter by keyword (e.g., -f tts for TTS models, -f flash for flash models).
pip install montaigne[edit].
health (check API status), video (generate video in cloud), status (check job), download (get output), jobs (list jobs). Requires pip install montaigne[cloud].
--export srt or --export vtt to export annotations to SRT or WebVTT subtitle formats. Keyboard shortcuts: I/O for in/out points, [/] for frame stepping. Requires pip install montaigne[annotate].
The world is but a school of inquiry. The matter is not who shall hit the ring, but who shall make the best courses at it.
Examples
Common Workflows
See montaigne in action with these real-world examples.
# Extract slides from a PDF $ essai pdf presentation.pdf --dpi 200 Extracted 15 pages to ./presentation_pages/ # Generate a voiceover script from PDF or PPTX $ essai script --input presentation.pptx PPTX contains slide notes — using them as voiceover script Generated voiceover script from notes: presentation_voiceover.md # Generate audio with Gemini TTS $ essai audio --script voiceover.md --voice Kore Generated 15 audio files in ./audio/ # Or use ElevenLabs for premium voices $ essai audio --script voiceover.md --provider elevenlabs --voice adam Generated 15 audio files in ./audio/ # Or use Coqui for local TTS (no API key needed) $ pip install "montaigne[coqui]" $ essai audio --script voiceover.md --provider coqui --voice female Generated 15 audio files in ./audio/ # Generate video from PPTX (notes become voiceover automatically) $ essai video --pdf presentation.pptx Step 1: Extracting PPTX slides... Step 2: Using PPTX slide notes as voiceover script... Step 3: Generating audio... Step 4: Creating video... # Or from PDF with custom models $ essai video --pdf presentation.pdf --audio-model gemini-2.5-flash-preview-tts Step 1: Extracting PDF pages... Step 2: Generating voiceover script... Step 3: Generating audio... Step 4: Creating video... # List available TTS models $ essai models -f tts gemini-2.5-flash-preview-tts gemini-2.5-pro-preview-tts # Full localization in one command $ essai localize --pdf presentation.pdf --lang French Extracting PDF pages... Translating images to French... Generating audio... Localization complete! # Annotate a video with timestamps $ essai annotate recording.mp4 Starting annotation server at http://localhost:5000 # Export annotations to subtitles $ essai annotate recording.mp4 --export srt Exported 24 annotations to recording.srt