v1.4.0 · MIT Licensed

Reflect. Narrate.
Present.

A Python toolkit for presentation animation. Extract slides, translate visuals, generate voiceovers, and create videos—powered by Google Gemini AI, ElevenLabs, and local TTS.

GitHub View on PyPI

Features

Everything You Need

A complete toolkit for animating presentations and generating multimedia content with AI.

PDF & PPTX Input

Accept PDF or PPTX presentations as input. Extract high-quality slide images with configurable DPI. PPTX slide notes are auto-used as voiceover scripts.

Script Generation

Generate professional voiceover scripts using a two-pass AI approach. Holistic context analysis, narrative arc awareness, and production notes with pronunciation guides.

Image Translation

Translate text in images to any target language. Powered by Nano Banana Pro 🍌 for accurate, context-aware translations.

Voice Synthesis

Generate natural voiceover audio from scripts. Three providers: Gemini TTS (cloud), ElevenLabs (premium voices), or Coqui XTTS-v2 (local, no API key required).

Video Generation

Combine translated slides and voiceover audio into polished videos. Configurable resolution up to 1920x1080.

PowerPoint I/O

Import PPTX as input (extract slides and notes) or export to PPTX from PDF/images. Voiceover scripts can be added as speaker notes.

Cloud Deployment

Offload video generation to Google Cloud Run. Upload PDFs, process in the cloud, and download results with secure signed URLs.

Model Configuration

Customize Gemini models for each operation. Use --model flags to switch between flash and pro models based on your needs.

Video Annotation

Frame-accurate video and audio annotation tool. Add timestamps, export to WebVTT/SRT formats for captions. Waveform visualization with click-to-seek. Requires pip install montaigne[annotate].

Quick Start

Simple by Design

Get started in seconds. Set up your API key and start localizing presentations.

# Install the package
$ pip install montaigne

# Create .env file with your API keys
$ echo "GEMINI_API_KEY=your-gemini-key" > .env
$ echo "ELEVENLABS_API_KEY=your-elevenlabs-key" >> .env  # Optional

# Verify everything is set up correctly
$ essai setup
API key configured successfully

## Slide 1: Introduction
**Duration:** 45-60 seconds
**Tone:** Inviting, setting the stage

### Voice-Over:

Welcome to this presentation on *artificial intelligence*
and its transformative applications in modern business.

---

## Slide 2: Key Benefits
**Duration:** 60-75 seconds
**Tone:** Energetic, solution-oriented

### Voice-Over:

Now that we've set the stage, let's explore the three
main benefits: automation, insights, and scalability.

---

## Production Notes

### Pronunciation Guide
- `AI`: "A-I"
- `API`: "A-P-I"

CLI Reference

The essai Command

A powerful command-line interface for all your localization needs.

essai setup Verify your environment and API key configuration. Run this first to ensure everything is properly set up.

essai pdf Extract PDF pages to images. Options: --dpi 200 for resolution, --format jpg for output format.

essai script Generate voiceover scripts with two-pass AI analysis. Accepts PDF, PPTX, or image folder. PPTX slide notes are used directly if present. Options: --input, --output, --context, --model (default: gemini-3-pro-preview).

essai audio Generate voiceover audio from scripts. Options: --script, --voice, --provider (gemini, elevenlabs, or coqui), --model (default: gemini-2.5-pro-preview-tts), --list-voices. Coqui runs locally without API key: pip install montaigne[coqui].

essai images Translate text in images to target language. Options: --input, --lang (default: French), --model (default: gemini-3-pro-image-preview).

essai localize Full localization pipeline: extract PDF or PPTX, translate images, generate audio. Options: --pdf (accepts PDF or PPTX), --script, --lang, --provider.

essai video Generate video from slides and audio. Accepts PDF or PPTX. Options: --pdf (accepts PDF or PPTX), --context, --provider, --script-model, --audio-model.

essai ppt Create PowerPoint from PDF or images, or use PPTX as input for other commands. Options: --input for PDF, PPTX, or folder, --script to add speaker notes, --keep-images.

essai models List available Gemini models. Options: --filter / -f to filter by keyword (e.g., -f tts for TTS models, -f flash for flash models).

essai edit Launch interactive Streamlit web editor for voiceover scripts. Requires pip install montaigne[edit].

essai cloud Cloud deployment commands. Subcommands: health (check API status), video (generate video in cloud), status (check job), download (get output), jobs (list jobs). Requires pip install montaigne[cloud].

essai annotate Frame-accurate video/audio annotation tool with waveform visualization. Auto-detects media files in current directory when no input specified. Options: --export srt or --export vtt to export annotations to SRT or WebVTT subtitle formats. Keyboard shortcuts: I/O for in/out points, [/] for frame stepping. Requires pip install montaigne[annotate].

The world is but a school of inquiry. The matter is not who shall hit the ring, but who shall make the best courses at it.
— Michel de Montaigne, Essays

Examples

Common Workflows

See montaigne in action with these real-world examples.

# Extract slides from a PDF
$ essai pdf presentation.pdf --dpi 200
Extracted 15 pages to ./presentation_pages/

# Generate a voiceover script from PDF or PPTX
$ essai script --input presentation.pptx
PPTX contains slide notes — using them as voiceover script
Generated voiceover script from notes: presentation_voiceover.md

# Generate audio with Gemini TTS
$ essai audio --script voiceover.md --voice Kore
Generated 15 audio files in ./audio/

# Or use ElevenLabs for premium voices
$ essai audio --script voiceover.md --provider elevenlabs --voice adam
Generated 15 audio files in ./audio/

# Or use Coqui for local TTS (no API key needed)
$ pip install "montaigne[coqui]"
$ essai audio --script voiceover.md --provider coqui --voice female
Generated 15 audio files in ./audio/

# Generate video from PPTX (notes become voiceover automatically)
$ essai video --pdf presentation.pptx
Step 1: Extracting PPTX slides...
Step 2: Using PPTX slide notes as voiceover script...
Step 3: Generating audio...
Step 4: Creating video...

# Or from PDF with custom models
$ essai video --pdf presentation.pdf --audio-model gemini-2.5-flash-preview-tts
Step 1: Extracting PDF pages...
Step 2: Generating voiceover script...
Step 3: Generating audio...
Step 4: Creating video...

# List available TTS models
$ essai models -f tts
gemini-2.5-flash-preview-tts
gemini-2.5-pro-preview-tts

# Full localization in one command
$ essai localize --pdf presentation.pdf --lang French
Extracting PDF pages...
Translating images to French...
Generating audio...
Localization complete!

# Annotate a video with timestamps
$ essai annotate recording.mp4
Starting annotation server at http://localhost:5000

# Export annotations to subtitles
$ essai annotate recording.mp4 --export srt
Exported 24 annotations to recording.srt

Reflect. Narrate.Present.