Event-aware lecture summarizer using V-JEPA
Project description
Lecture Mind
Event-aware lecture summarizer using V-JEPA visual encoder for real-time, context-aware summaries and retrieval.
Features
- Visual Encoding: DINOv2 ViT-L/16 for 768-dim frame embeddings
- Text Encoding: sentence-transformers (all-MiniLM-L6-v2) for query embeddings
- Audio Transcription: Whisper integration for lecture transcription
- Multimodal Search: Combined visual + transcript ranking with configurable weights
- Event Detection: Automatic slide transition and scene change detection
- FAISS Index: Fast similarity search with IVF optimization for large collections
Installation
Basic Installation (CPU)
pip install lecture-mind
With ML Dependencies (GPU recommended)
pip install lecture-mind[ml]
With Audio Transcription
pip install lecture-mind[audio]
Full Installation
pip install lecture-mind[all]
Development Installation
git clone https://github.com/matte1782/lecture-mind.git
cd lecture-mind
pip install -e ".[dev,ml,audio]"
Quick Start
CLI Usage
# Process a lecture video
lecture-mind process lecture.mp4 --output data/
# Query the processed lecture
lecture-mind query data/ "What is gradient descent?"
# List detected events
lecture-mind events data/
# Get help
lecture-mind --help
Python API
from vl_jepa import (
VideoInput,
FrameSampler,
VisualEncoder,
TextEncoder,
MultimodalIndex,
EventDetector,
)
# Load and sample video frames
with VideoInput.from_file("lecture.mp4") as video:
sampler = FrameSampler(fps=1.0)
frames = sampler.sample(video)
# Encode frames (uses placeholder encoder by default)
encoder = VisualEncoder.load()
embeddings = encoder.encode_batch(frames)
# Build searchable index
index = MultimodalIndex()
index.add_visual(embeddings, timestamps=[f.timestamp for f in frames])
# Query the lecture
text_encoder = TextEncoder.load()
query_emb = text_encoder.encode("machine learning basics")
results = index.search(query_emb, k=5)
for result in results:
print(f"Timestamp: {result.timestamp:.1f}s, Score: {result.score:.3f}")
Architecture
lecture.mp4
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ VideoInput │────▶│FrameSampler│────▶│ Frames │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│VisualEncoder│ │EventDetector│ │AudioExtract │
│ (DINOv2) │ │ │ │ (FFmpeg) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Embeddings │ │ Events │ │ Transcriber │
│ (768-dim) │ │ │ │ (Whisper) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────────────────────────┼──────────────────────────┘
▼
┌─────────────────┐
│ MultimodalIndex │
│ (FAISS) │
└─────────────────┘
│
▼
┌─────────────────┐
│ Search/Query │
└─────────────────┘
Performance
| Operation | Target | Actual |
|---|---|---|
| Query latency (1k vectors) | <100ms | 30.6µs |
| Search latency (100k vectors) | <100ms | 106.4µs |
| Frame embedding (placeholder) | <50ms | 0.36ms |
| Event detection | <10ms | 0.24ms |
See BENCHMARKS.md for detailed performance analysis.
Requirements
- Python 3.10+
- NumPy >= 1.24.0
- OpenCV >= 4.8.0
Optional Dependencies
- ML: PyTorch >= 2.0, transformers, sentence-transformers, FAISS
- Audio: faster-whisper >= 1.0.0
- UI (v0.3.0): Gradio >= 4.0.0
Development
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src/vl_jepa --cov-report=term
# Lint and format
ruff check src/ && ruff format src/
# Type check
mypy src/ --strict
# Run benchmarks
pytest tests/benchmarks/ -v --benchmark-only
Roadmap
- v0.1.0: Foundation (placeholder encoders, basic pipeline)
- v0.2.0: Real Models + Audio (DINOv2, Whisper, multimodal search)
- v0.3.0: User Experience (Gradio web UI, Docker)
- v1.0.0: Production (optimization, real decoder, deployment)
License
MIT License - see LICENSE for details.
Citation
If you use Lecture Mind in your research, please cite:
@software{lecture_mind,
title = {Lecture Mind: Event-aware Lecture Summarizer},
author = {Matteo Panzeri},
year = {2026},
url = {https://github.com/matte1782/lecture-mind}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lecture_mind-0.2.0.tar.gz.
File metadata
- Download URL: lecture_mind-0.2.0.tar.gz
- Upload date:
- Size: 201.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
451cd7def01cd83d70d35ba8d239aac03f3621403d24ae5eda5f6e49b68e46b3
|
|
| MD5 |
991d023f4ed8197baa2f2cb9071126df
|
|
| BLAKE2b-256 |
71de619e0e8dd1831ff236e745c313403bf41ebf69ae80d820659e54601ad17b
|
File details
Details for the file lecture_mind-0.2.0-py3-none-any.whl.
File metadata
- Download URL: lecture_mind-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
405dc340634481b15f9d13a300b010e685e44832ff29fb5263ee8ccf8082e2aa
|
|
| MD5 |
4d39d5e166a8b8e5b5df0ac5f7f7f0f2
|
|
| BLAKE2b-256 |
7a4c99f41305e2b673aabfcbc941dca2d83702720f9dc0969defc6bc2c40fdb4
|