A comprehensive toolkit for downloading, merging, and transcribing lecture videos
Project description
Lecture Downloader
A Python toolkit for downloading, merging, transcribing, and embedding subtitles from lecture videos hosted on platforms like Canvas and Brightspace.
Use at your own risk. This tool is designed for educational purposes and should not be used to violate any terms of service or copyright laws.
Quick Start
30-Second Setup
# Install FFmpeg (required for video processing)
brew install ffmpeg # macOS
# sudo apt install ffmpeg # Ubuntu/Debian
# Windows: https://www.wikihow.com/Install-FFmpeg-on-Windows
# Install lecture downloader
pip install lecture-downloader
Obtaining Video URLs from Canvas/Brightspace
Implementation based off this reddit post
Using Video DownloadHelper Extension
- Install Extension: Download VideoDownloadHelper
- Navigate to Video: Go to your lecture video in Canvas/Brightspace
- Start Playback: Click play to begin streaming
- Extract URL: Click the extension icon (should be colored, not grey)
- Copy URL: Click the three dots → "Copy URL"
For example, visiit Public Lecture sample, click play on a video, and copy the URL from the extension. To bulk download, paste it into a text file named lecture_links.txt, one URL per line.
Basic Usage
One-Command Pipeline
# Complete pipeline: download → merge → transcribe
pipeline_results = ld.process_pipeline(
links="lecture_links.txt", # Can also use: single URL string, ["url1", "url2"]
titles="lecture_titles.json", # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
output_dir="lecture_processing",
inject_subtitles=True, # False to skip subtitle injection
transcription_method="whisper", # "auto", "gcloud", "whisper"
language="en",
)
Step-by-Step Commands
import lecture_downloader as ld
# Complete workflow in 3 commands
base_dir = "Lecture-Downloads"
# 1. Download lectures
results = ld.download_lectures(
links="lecture_links.txt", # Can also use: single URL string, ["url1", "url2"]
titles="lecture_titles.json", # Can also use: ["Title 1", "Title 2"], {"Module 1": ["Lecture 1"]}
base_dir=base_dir, # Creates Lecture-Downloads/lecture-downloads/
)
# 2. Merge videos by module with chapters
merged = ld.merge_videos(
base_dir=base_dir, # Auto-detects input from lecture-downloads/
)
# 3. Transcribe with Whisper (local, no setup required)
transcripts = ld.transcribe_videos(
base_dir=base_dir, # Auto-detects input from merged-lectures/
method="whisper", # "auto" detects best available method
language="en", # Language code for Whisper
inject_subtitles=True, # False to skip subtitle injection
)
Installation
# Basic installation
pip install lecture-downloader
Required Dependencies:
ffmpeg- Install via package manager (brew, apt, etc.)- Python 3.8+
Configuration Options
Download Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
links |
str/list | Required | File path, single URL, or list of URLs |
titles |
str/list/dict | None | File path, list, or dict mapping |
base_dir |
str | "." | Base directory (creates subdirectories) |
max_workers |
int | 5 | Concurrent downloads (1-10) |
verbose |
bool | False | Detailed progress output |
Merge Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
base_dir |
str | "." | Base directory (auto-detects input) |
verbose |
bool | False | Detailed progress output |
Transcribe Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
base_dir |
str | "." | Base directory (auto-detects input) |
method |
str | "auto" | "auto", "whisper" |
language |
str | "en" | Language code for Whisper |
max_workers |
int | 3 | Concurrent transcriptions (1-5) |
inject_subtitles |
bool | True | Inject SRT into video files |
verbose |
bool | False | Detailed progress output |
Input Formats
Links Input
# File with URLs (one per line)
links = "lecture_links.txt"
# Single URL
links = "https://example.com/lecture.mp4"
# List of URLs
links = ["https://url1.mp4", "https://url2.mp4"]
Titles Input
# JSON file with module structure
titles = "lecture_titles.json"
**lecture_links.txt:**
https://example.com/lecture1.mp4 https://example.com/lecture2.mp4
# List of titles (matches link order)
titles = ["Lecture 1", "Lecture 2", "Lecture 3"]
# Dictionary mapping modules to lectures
titles = {
"Module 1: Introduction": ["Lecture 1", "Lecture 2"],
"Module 2: Advanced": ["Lecture 3", "Lecture 4"]}
lecture_titles.json:
{ "Module 1: Introduction": [ "Lecture 1: Overview", "Lecture 2: Fundamentals"],
"Module 2: Advanced Topics": [ "Lecture 3: Advanced Concepts"]}
CLI Usage
Quick Commands
# Complete workflow
BASE_DIR="Lecture-Downloads"
lecture-downloader download -l links.txt -t titles.json -b $BASE_DIR
lecture-downloader merge -b $BASE_DIR
lecture-downloader transcribe -b $BASE_DIR
# One-command pipeline
lecture-downloader pipeline -l links.txt -t titles.json -o output
CLI Options
# Download with options
lecture-downloader download \
-l links.txt \
-t titles.json \
-b Lecture-Downloads \
--max-workers 8 \
--verbose
# Transcribe with options
lecture-downloader transcribe \
-b Lecture-Downloads \
--method whisper \
--language en \
--max-workers 4 \
--no-inject
FFmpeg not found:
# Install FFmpeg
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian
Debug Mode
# Enable verbose output for troubleshooting
ld.download_lectures(links, titles, verbose=True)
ld.merge_videos(base_dir="course", verbose=True)
ld.transcribe_videos(base_dir="course", verbose=True)
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lecture_downloader-1.0.0.tar.gz.
File metadata
- Download URL: lecture_downloader-1.0.0.tar.gz
- Upload date:
- Size: 756.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e9683fa7190aee0986c3b06502e5b99293e0c6390b3d4591d993f132eb4dcc0
|
|
| MD5 |
437580a2cc876d4c9329779811db72c9
|
|
| BLAKE2b-256 |
06802c5e4762e618631be3c1b1f2460ada64eaee8e56096309414faa56a4550b
|
File details
Details for the file lecture_downloader-1.0.0-py3-none-any.whl.
File metadata
- Download URL: lecture_downloader-1.0.0-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
326eea81703d6a8754490a4eb1ab9915bc0bed3a8bdab25c9f14460f408e072a
|
|
| MD5 |
0a8c72bae419eb042a7b47931c411784
|
|
| BLAKE2b-256 |
0913d2fb11e3ba4e2092fab9a5ed76ab2b6c4a450aecf5a564546c7e9cc5716f
|