Skip to main content

A comprehensive toolkit for downloading, merging, and transcribing lecture videos

Project description

Lecture Downloader

A Python toolkit for downloading, merging, transcribing, and embedding subtitles from lecture videos hosted on platforms like Canvas and Brightspace.

Use at your own risk. This tool is designed for educational purposes and should not be used to violate any terms of service or copyright laws.

Quick Start

30-Second Setup

# Install FFmpeg (required for video processing)
brew install ffmpeg  # macOS
# sudo apt install ffmpeg  # Ubuntu/Debian
# Windows: https://www.wikihow.com/Install-FFmpeg-on-Windows

# Install lecture downloader
pip install lecture-downloader

Obtaining Video URLs from Canvas/Brightspace

Implementation based off this reddit post

Using Video DownloadHelper Extension

  1. Install Extension: Download VideoDownloadHelper
  2. Navigate to Video: Go to your lecture video in Canvas/Brightspace
  3. Start Playback: Click play to begin streaming
  4. Extract URL: Click the extension icon (should be colored, not grey)
  5. Copy URL: Click the three dots → "Copy URL"

For example, visiit Public Lecture sample, click play on a video, and copy the URL from the extension. To bulk download, paste it into a text file named lecture_links.txt, one URL per line.

Lecture Downloader screenshot Lecture Downloader screenshot

Basic Usage

One-Command Pipeline

# Complete pipeline: download → merge → transcribe

pipeline_results = ld.process_pipeline(
    links="lecture_links.txt",  # Can also use: single URL string, ["url1", "url2"]
    titles="lecture_titles.json",  # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
    output_dir="lecture_processing",
    inject_subtitles=True,          # False to skip subtitle injection
    transcription_method="whisper", # "auto", "gcloud", "whisper"
    language="en",                  
)

Step-by-Step Commands

import lecture_downloader as ld

# Complete workflow in 3 commands
base_dir = "Lecture-Downloads"

# 1. Download lectures
results = ld.download_lectures(
    links="lecture_links.txt",  # Can also use: single URL string, ["url1", "url2"]
    titles="lecture_titles.json",  # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
    base_dir=base_dir,  # Creates Lecture-Downloads/lecture-downloads/
)

# 2. Merge videos by module with chapters
merged = ld.merge_videos(
    base_dir=base_dir,  # Auto-detects input from lecture-downloads/
)

# 3. Transcribe with Whisper (local, no setup required)
transcripts = ld.transcribe_videos(
    base_dir=base_dir,  # Auto-detects input from merged-lectures/
    method="whisper",  # "auto" detects best available method
    language="en",  # Language code for Whisper
    inject_subtitles=True,  # False to skip subtitle injection
)

Installation

# Basic installation
pip install lecture-downloader

Required Dependencies:

  • ffmpeg - Install via package manager (brew, apt, etc.)
  • Python 3.8+

Configuration Options

Download Parameters

Parameter Type Default Description
links str/list Required File path, single URL, or list of URLs
titles str/list/dict None File path, list, or dict mapping
base_dir str "." Base directory (creates subdirectories)
max_workers int 5 Concurrent downloads (1-10)
verbose bool False Detailed progress output

Merge Parameters

Parameter Type Default Description
base_dir str "." Base directory (auto-detects input)
verbose bool False Detailed progress output

Transcribe Parameters

Parameter Type Default Description
base_dir str "." Base directory (auto-detects input)
method str "auto" "auto", "whisper"
language str "en" Language code for Whisper
max_workers int 3 Concurrent transcriptions (1-5)
inject_subtitles bool True Inject SRT into video files
verbose bool False Detailed progress output

Input Formats

Links Input

# File with URLs (one per line)
links = "lecture_links.txt"

# Single URL
links = "https://example.com/lecture.mp4"

# List of URLs
links = ["https://url1.mp4", "https://url2.mp4"]

Titles Input

# JSON file with module structure
titles = "lecture_titles.json"
**lecture_links.txt:**

https://example.com/lecture1.mp4 https://example.com/lecture2.mp4

# List of titles (matches link order)
titles = ["Lecture 1", "Lecture 2", "Lecture 3"]

# Dictionary mapping modules to lectures
titles = {
    "Module 1: Introduction": ["Lecture 1", "Lecture 2"],
    "Module 2: Advanced": ["Lecture 3", "Lecture 4"]}

lecture_titles.json:

{ "Module 1: Introduction": [ "Lecture 1: Overview", "Lecture 2: Fundamentals"], 
  "Module 2: Advanced Topics": [  "Lecture 3: Advanced Concepts"]}

CLI Usage

Quick Commands

# Complete workflow
BASE_DIR="Lecture-Downloads"
lecture-downloader download -l links.txt -t titles.json -b $BASE_DIR
lecture-downloader merge -b $BASE_DIR
lecture-downloader transcribe -b $BASE_DIR

# One-command pipeline
lecture-downloader pipeline -l links.txt -t titles.json -o output

CLI Options

# Download with options
lecture-downloader download \
  -l links.txt \
  -t titles.json \
  -b Lecture-Downloads \
  --max-workers 8 \
  --verbose

# Transcribe with options
lecture-downloader transcribe \
  -b Lecture-Downloads \
  --method whisper \
  --language en \
  --max-workers 4 \
  --no-inject

FFmpeg not found:

# Install FFmpeg
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu/Debian

Debug Mode

# Enable verbose output for troubleshooting
ld.download_lectures(links, titles, verbose=True)
ld.merge_videos(base_dir="course", verbose=True)
ld.transcribe_videos(base_dir="course", verbose=True)

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lecture_downloader-1.0.0.tar.gz (756.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lecture_downloader-1.0.0-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file lecture_downloader-1.0.0.tar.gz.

File metadata

  • Download URL: lecture_downloader-1.0.0.tar.gz
  • Upload date:
  • Size: 756.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lecture_downloader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7e9683fa7190aee0986c3b06502e5b99293e0c6390b3d4591d993f132eb4dcc0
MD5 437580a2cc876d4c9329779811db72c9
BLAKE2b-256 06802c5e4762e618631be3c1b1f2460ada64eaee8e56096309414faa56a4550b

See more details on using hashes here.

File details

Details for the file lecture_downloader-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lecture_downloader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 326eea81703d6a8754490a4eb1ab9915bc0bed3a8bdab25c9f14460f408e072a
MD5 0a8c72bae419eb042a7b47931c411784
BLAKE2b-256 0913d2fb11e3ba4e2092fab9a5ed76ab2b6c4a450aecf5a564546c7e9cc5716f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page