scholium

Generate educational videos from markdown slides with AI voice synthesis

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ccaprani

These details have not been verified by PyPI

Project description

Scholium

Automated instructional video generation from markdown.

Scholium (Greek: σχόλιον) — An explanatory note or commentary. Your digital scholium for the modern classroom.

Convert markdown slides with embedded narration into professional videos. Perfect for flipped classroom content, lecture recordings, and maintaining course libraries.

Scholium terminal demo — generating a narrated video from markdown

▶ Watch the output video

Quick Start

# 1. Install (requires Python 3.11+, pandoc, ffmpeg)
pip install scholium[piper]

# 2. Create a markdown file with embedded narration
cat > lecture.md << 'EOF'
---
title: "Newton's Laws"
author: "Physics 101"
title_notes: |
  Welcome to today's lecture on Newton's Laws of Motion.
---

# What Are Newton's Laws?

Three fundamental principles governing motion.

::: notes
Newton's three laws form the foundation of classical mechanics.
Every object in the universe obeys these rules.
:::

# The First Law

An object in motion stays in motion unless acted upon by a force.

::: notes
This is the law of inertia.
Objects resist any change to their state of motion.
:::
EOF

# 3. Generate video
scholium generate lecture.md lecture.mp4

# 4. That's it! Your video is ready.

Key Features

📝 Unified Markdown Format: Slides and narration in one file with ::: notes ::: blocks
🎯 Pandoc Integration: Full Beamer support with slide-level for section-based lectures
🎤 Multiple TTS Providers: Piper (local), ElevenLabs (cloud), Coqui, F5-TTS, StyleTTS2, Tortoise (local voice cloning), OpenAI, Bark
⏱️ Flexible Timing: Control pauses, slide duration, and pacing with simple directives
🔧 Production Ready: Batch processing, validation, verbose output
🎨 Professional Output: 1080p video with synchronized audio and slides

Installation

System Requirements

# Ubuntu/Debian
sudo apt-get install pandoc texlive-latex-base texlive-latex-extra ffmpeg

# macOS
brew install pandoc mactex ffmpeg

# Windows
choco install pandoc miktex ffmpeg

Install Scholium

# Recommended: Piper (fast, local, no API key needed)
pip install scholium[piper]

# Or other providers:
pip install scholium[elevenlabs]  # High quality cloud API
pip install scholium[coqui]       # Local voice cloning
pip install scholium[openai]      # OpenAI TTS
pip install scholium[bark]        # Highest quality, slowest
pip install scholium[f5tts]       # Fast local voice cloning (zero-shot)
pip install scholium[styletts2]   # Expressive local voice cloning
pip install scholium[tortoise]    # Very high quality local voice cloning

# All providers:
pip install scholium[all]

Usage

Basic Command

scholium generate slides.md output.mp4 [options]

Common Options

--voice NAME: Voice ID to use (e.g., en_US-lessac-medium for Piper, an ElevenLabs voice ID, or a registered local voice name)
--provider NAME: TTS provider (piper, elevenlabs, coqui, openai, bark, f5tts, styletts2, tortoise)
--section-duration SECONDS: Duration for silent section/TOC slides (default: 3.0)
--verbose: Show detailed progress
--keep-temp: Keep temporary files for debugging

Example

# With Piper (local)
scholium generate lecture.md lecture.mp4 \
    --provider piper \
    --voice en_US-lessac-medium \
    --section-duration 2.0 \
    --verbose

# With ElevenLabs (cloud)
export ELEVENLABS_API_KEY="your_key"
scholium generate lecture.md lecture.mp4 \
    --provider elevenlabs \
    --voice Xb7hH8MSUJpSbSDYk0k2  # Alice - Clear, Engaging Educator

Markdown Format

Structure

Scholium uses standard Pandoc markdown with embedded ::: notes ::: blocks for narration:

---
title: "My Lecture"
author: "Your Name"
slide-level: 2  # Use ## for slides, # for sections
---

# Section Title

<!-- This creates a table-of-contents slide (no narration needed) -->

## First Slide

Your slide content here.

::: notes
This narration will be spoken over the slide.
You can use multiple paragraphs.
:::

## Second Slide

More content.

::: notes
:: Reference: See textbook page 47
:: Author note: Double-check this calculation

This narration will be spoken.
Lines starting with :: are metadata - not narrated.
<!-- HTML comments are also ignored -->

More spoken narration here.
:::

# Another Section

## Third Slide

Content continues.

::: notes
And so does the narration.
:::

Notes blocks can contain:

Narration text: Regular text is converted to speech
Metadata (:: prefix): Author notes, references, reminders - not narrated
HTML comments (): Also ignored during narration
Timing directives: [MIN 10s], [PRE 2s], etc. - control timing, not spoken

Slide Levels (Pandoc Integration)

Use the slide-level in YAML frontmatter to control slide structure:

slide-level: 1 (default): Each # heading creates a slide

---
slide-level: 1
---

# Slide One
Content

::: notes
Narration
:::

# Slide Two
Content

## Just a subheading within Slide Two

::: notes
More narration
:::

slide-level: 2 (for section-based lectures): # creates sections with TOC slides, ## creates content slides

---
slide-level: 2
---

# Section Title
<!-- Auto-generates TOC slide, no narration needed -->

## Actual Slide One
Content

::: notes
Narration for slide one
:::

## Actual Slide Two
Content

::: notes
Narration for slide two
:::

Timing Control

Add timing directives inside ::: notes ::: blocks:

## Complex Diagram

[Large diagram image]

::: notes
:: Reference: Figure adapted from Smith et al. (2023)
:: TODO: Update with latest data next semester

[MIN 15s] [PRE 2s] [POST 3s]

Take a moment to examine this diagram.
[PAUSE 2s]
Notice the three main components...
:::

Available directives:

[MIN 10s] - Minimum slide duration (even if narration is shorter)
[PRE 2s] - Pause 2 seconds before speaking
[POST 3s] - Pause 3 seconds after speaking
[PAUSE 2s] - 2-second mid-narration pause
[DUR 5s] - Fixed duration (overrides everything)

Metadata in notes (prefixed with ::):

Not converted to speech
Useful for references, author notes, TODOs
Helps maintain context when editing lectures

Incremental Bullets

Use >- for incremental bullet reveals (Pandoc/Beamer syntax):

## Key Points

>- First point appears
>- Then second point
>- Finally third point

::: notes
Let's look at three key points.

First, we have the foundation concept.

Second, the application of that concept.

And third, the implications for our work.
:::

Each bullet creates a new slide page. Split your narration into paragraphs (separated by blank lines) to match.

TTS Providers

Provider	Type	Quality	Speed	Voice Cloning	API Key	Cost	`[all]`
Piper	Local	⭐⭐⭐⭐	Fast	❌	❌	Free	✅
ElevenLabs	Cloud	⭐⭐⭐⭐⭐	Fast	✅	✅	Paid	✅
Coqui	Local	⭐⭐⭐⭐	Medium	✅	❌	Free	❌
OpenAI	Cloud	⭐⭐⭐⭐	Fast	❌	✅	Paid	✅
Bark	Local	⭐⭐⭐⭐⭐	Slow	⚠️	❌	Free	❌
F5-TTS	Local	⭐⭐⭐⭐⭐	Fast	✅	❌	Free	✅
StyleTTS2	Local	⭐⭐⭐⭐⭐	Medium	✅	❌	Free	❌
Tortoise	Local	⭐⭐⭐⭐⭐	Slow	✅	❌	Free	❌

pip install scholium[all] installs only the four ✅ providers (Piper, ElevenLabs, OpenAI, F5-TTS). Coqui, Bark, StyleTTS2, and Tortoise have transitive dependency conflicts on Python 3.11+ — install individually.

Piper (Recommended)

pip install scholium[piper]
scholium generate lecture.md output.mp4 --provider piper

Available voices: en_US-lessac-medium, en_US-amy-medium, en_GB-alan-medium, etc.

ElevenLabs (Highest Quality)

ElevenLabs voices are identified by a Voice ID, not their display name. Use list-voices to find the ID for the voice you want:

pip install scholium[elevenlabs]
export ELEVENLABS_API_KEY="your_key"

# List voices — shows Name and Voice ID side by side
scholium list-voices --provider elevenlabs

# Use the Voice ID with --voice (not the display name)
scholium generate lecture.md output.mp4 --provider elevenlabs --voice Xb7hH8MSUJpSbSDYk0k2

Coqui (Local Voice Cloning)

pip install scholium[coqui]
scholium train-voice --name my_voice --provider coqui --sample recording.wav
scholium generate lecture.md output.mp4 --provider coqui --voice my_voice

F5-TTS (Fast Local Voice Cloning)

Zero-shot cloning from a 5-15 second reference clip — no training step required.

pip install scholium[f5tts]

# Option A: register a voice in the library
scholium train-voice --name my_voice --provider f5tts --sample recording.wav
scholium generate lecture.md output.mp4 --provider f5tts --voice my_voice

# Option B: point directly to a reference file in config.yaml
# f5tts:
#   model_path: "f5tts/my_voice/sample.wav"   # relative to voices_dir
#   ref_text: "Words spoken in the recording."

StyleTTS2 (Expressive Local Voice Cloning)

pip install scholium[styletts2]
scholium train-voice --name my_voice --provider styletts2 --sample recording.wav
scholium generate lecture.md output.mp4 --provider styletts2 --voice my_voice

Or set styletts2.model_path in config.yaml to skip voice registration.

Tortoise TTS (Highest-Quality Local Cloning)

pip install scholium[tortoise]
scholium train-voice --name my_voice --provider tortoise --sample recording.wav
# Add extra clips for better quality:
cp clip2.wav ~/.local/share/scholium/voices/tortoise/my_voice/sample_2.wav
scholium generate lecture.md output.mp4 --provider tortoise --voice my_voice

Or set tortoise.model_path in config.yaml to skip voice registration.

Configuration

Create config.yaml in your project:

# Slide settings
pandoc_template: beamer

# TTS settings
tts_provider: piper
voice: en_US-lessac-medium

# Timing defaults
timing:
  default_pre_delay: 0.5      # Pause before speaking
  default_post_delay: 1.0     # Pause after speaking
  min_slide_duration: 3.0     # Minimum for any slide
  silent_slide_duration: 2.0  # Duration for TOC/section slides

# Video settings
resolution: [1920, 1080]
fps: 30

# Paths
voices_dir: ~/.local/share/scholium/voices
temp_dir: ./temp
keep_temp_files: false
verbose: true

# Provider-specific settings
piper:
  quality: medium

elevenlabs:
  model: eleven_multilingual_v2

coqui:
  model: tts_models/multilingual/multi-dataset/xtts_v2

# Zero-shot local providers: set model_path to use a reference audio file
# directly without registering a voice via scholium train-voice.
# Paths are relative to voices_dir (or absolute).
f5tts:
  model: "F5-TTS"
  # model_path: "f5tts/my_voice/sample.wav"
  # ref_text: "Exact words spoken in the reference clip."

styletts2:
  alpha: 0.3
  beta: 0.7
  diffusion_steps: 5
  # model_path: "styletts2/my_voice/sample.wav"

tortoise:
  preset: "fast"
  # model_path: "tortoise/my_voice/sample.wav"

Voice Management

List Voices

# Local voice library (Coqui, F5-TTS, StyleTTS2, Tortoise)
scholium list-voices

# ElevenLabs cloud voices — shows Name and Voice ID
scholium list-voices --provider elevenlabs

Register a Voice

All zero-shot local providers (Coqui, F5-TTS, StyleTTS2, Tortoise) use the same command:

scholium train-voice \
    --name my_lecture_voice \
    --provider f5tts \          # or coqui, styletts2, tortoise
    --sample my_recording.wav \
    --description "My natural teaching voice"

Skip Registration with `model_path`

For F5-TTS, StyleTTS2, and Tortoise, you can point directly to a reference file in config.yaml without registering a voice:

f5tts:
  model_path: "f5tts/my_voice/sample.wav"   # relative to voices_dir, or absolute
  ref_text: "The words spoken in the clip."  # optional but improves accuracy

Regenerate Embeddings (Coqui)

# Pre-compute speaker embeddings to speed up Coqui generation
scholium regenerate-embeddings --voice my_lecture_voice

Batch Processing

Process multiple lectures with a simple script:

#!/bin/bash
for lecture in lectures/*.md; do
    output="${lecture%.md}.mp4"
    scholium generate "$lecture" "$output" --verbose
done

Or use Python:

from pathlib import Path
import subprocess

for lecture in Path("lectures").glob("*.md"):
    output = lecture.with_suffix(".mp4")
    subprocess.run([
        "scholium", "generate",
        str(lecture), str(output),
        "--verbose"
    ])

Examples

See the examples/ directory for:

Basic lecture with sections (example_level2.md)
Incremental bullets and timing
Voice cloning workflow
Batch processing scripts

Performance

Generation time (per 10-minute lecture):

NVIDIA GPU: 5-10 minutes
Apple Silicon: 10-15 minutes
Modern CPU: 30-60 minutes

First run: Models download automatically (~500MB-1.5GB), cached for future use.

Troubleshooting

"Pandoc not found": Install pandoc and LaTeX (see Installation)

"Narration bleeding over section slides": Make sure you have slide-level: 2 in your YAML frontmatter

"Slide count mismatch": Don't add ::: notes ::: after # section headings when using slide-level: 2

"Voice not found":

Piper: Use voice name like en_US-lessac-medium
ElevenLabs: Use voice ID (run the list command above)
Coqui / F5-TTS / StyleTTS2 / Tortoise: Use a registered voice name from scholium list-voices, or set model_path under the provider section in config.yaml

"Out of memory":

Close other applications
Use export CUDA_VISIBLE_DEVICES="" to force CPU
Process one lecture at a time

Documentation

Full docs: https://ccaprani.github.io/scholium
Examples: examples/ directory in this repo
Issues: GitHub Issues
API reference: scholium --help

Project Philosophy

Simple tool, not a framework. Scholium does one thing well: converts markdown+narration into video. It integrates with your existing workflow rather than replacing it.

Text-first. Everything is plain text (markdown + YAML), so it's:

Version controllable (Git)
Searchable and editable
Reproducible across systems
Easy to maintain

Pandoc-native. Uses standard Beamer slide syntax, so your slides work in LaTeX/Beamer too.

License

MIT License - see LICENSE file

Contributing

Contributions welcome! Focus areas:

New TTS provider integrations
Performance improvements
Documentation and examples
Bug fixes

Scholium: Your digital scholium for the modern classroom. 📖

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ccaprani

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2026.2

Mar 2, 2026

This version

2026.1

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scholium-2026.1.tar.gz (65.8 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scholium-2026.1-py3-none-any.whl (57.0 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file scholium-2026.1.tar.gz.

File metadata

Download URL: scholium-2026.1.tar.gz
Upload date: Feb 28, 2026
Size: 65.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scholium-2026.1.tar.gz
Algorithm	Hash digest
SHA256	`0a49128cfb586265647c32461ab328a4b3c1c0e787f88e62d5bb4c13bb7e1b5d`
MD5	`27a5816825c996314a5a9d7453d77d66`
BLAKE2b-256	`d53776edb5e4f62cb9c9054604d2bb71908de8cfe313e1b6bdce8accb4e70bb7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scholium-2026.1.tar.gz:

Publisher: publish.yml on ccaprani/scholium

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scholium-2026.1.tar.gz
- Subject digest: 0a49128cfb586265647c32461ab328a4b3c1c0e787f88e62d5bb4c13bb7e1b5d
- Sigstore transparency entry: 1004791873
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: ccaprani/scholium@4279d53e5037496acfa2bf5fe9aa19ab7eec0fd4
- Branch / Tag: refs/tags/v2026.1
- Owner: https://github.com/ccaprani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4279d53e5037496acfa2bf5fe9aa19ab7eec0fd4
- Trigger Event: release

File details

Details for the file scholium-2026.1-py3-none-any.whl.

File metadata

Download URL: scholium-2026.1-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 57.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scholium-2026.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18e9db6242a8d82c8b8e2b938161526bcc4dafbc3f957a5518a17df971230bf7`
MD5	`5d963bb308fc1d7bae162594a289f2be`
BLAKE2b-256	`894ab98dc62aa82d4c2f6c9ff64d2af91a3e7e3ca880fb2b3cd8683d96b149a0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for scholium-2026.1-py3-none-any.whl:

Publisher: publish.yml on ccaprani/scholium

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: scholium-2026.1-py3-none-any.whl
- Subject digest: 18e9db6242a8d82c8b8e2b938161526bcc4dafbc3f957a5518a17df971230bf7
- Sigstore transparency entry: 1004791875
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: ccaprani/scholium@4279d53e5037496acfa2bf5fe9aa19ab7eec0fd4
- Branch / Tag: refs/tags/v2026.1
- Owner: https://github.com/ccaprani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4279d53e5037496acfa2bf5fe9aa19ab7eec0fd4
- Trigger Event: release

scholium 2026.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Quick Start

Key Features

Installation

System Requirements

Install Scholium

Usage

Basic Command

Common Options

Example

Markdown Format

Structure

Slide Levels (Pandoc Integration)

Timing Control

Incremental Bullets

TTS Providers

Piper (Recommended)

ElevenLabs (Highest Quality)

Coqui (Local Voice Cloning)

F5-TTS (Fast Local Voice Cloning)

StyleTTS2 (Expressive Local Voice Cloning)

Tortoise TTS (Highest-Quality Local Cloning)

Configuration

Voice Management

List Voices

Register a Voice

Skip Registration with model_path

Regenerate Embeddings (Coqui)

Batch Processing

Examples

Performance

Troubleshooting

Documentation

Project Philosophy

License

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Skip Registration with `model_path`