Skip to main content

Event-aware lecture summarizer using V-JEPA

Project description

Lecture Mind

CI Python 3.10+ License: MIT Coverage

Event-aware lecture summarizer using V-JEPA visual encoder for real-time, context-aware summaries and retrieval.

Features

  • Visual Encoding: DINOv2 ViT-L/16 for 768-dim frame embeddings
  • Text Encoding: sentence-transformers (all-MiniLM-L6-v2) for query embeddings
  • Audio Transcription: Whisper integration for lecture transcription
  • Multimodal Search: Combined visual + transcript ranking with configurable weights
  • Event Detection: Automatic slide transition and scene change detection
  • FAISS Index: Fast similarity search with IVF optimization for large collections

Installation

Basic Installation (CPU)

pip install lecture-mind

With ML Dependencies (GPU recommended)

pip install lecture-mind[ml]

With Audio Transcription

pip install lecture-mind[audio]

Full Installation

pip install lecture-mind[all]

Development Installation

git clone https://github.com/matte1782/lecture-mind.git
cd lecture-mind
pip install -e ".[dev,ml,audio]"

Quick Start

CLI Usage

# Process a lecture video
lecture-mind process lecture.mp4 --output data/

# Query the processed lecture
lecture-mind query data/ "What is gradient descent?"

# List detected events
lecture-mind events data/

# Get help
lecture-mind --help

Python API

from vl_jepa import (
    VideoInput,
    FrameSampler,
    VisualEncoder,
    TextEncoder,
    MultimodalIndex,
    EventDetector,
)

# Load and sample video frames
with VideoInput.from_file("lecture.mp4") as video:
    sampler = FrameSampler(fps=1.0)
    frames = sampler.sample(video)

# Encode frames (uses placeholder encoder by default)
encoder = VisualEncoder.load()
embeddings = encoder.encode_batch(frames)

# Build searchable index
index = MultimodalIndex()
index.add_visual(embeddings, timestamps=[f.timestamp for f in frames])

# Query the lecture
text_encoder = TextEncoder.load()
query_emb = text_encoder.encode("machine learning basics")
results = index.search(query_emb, k=5)

for result in results:
    print(f"Timestamp: {result.timestamp:.1f}s, Score: {result.score:.3f}")

Architecture

lecture.mp4
    │
    ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ VideoInput  │────▶│FrameSampler│────▶│   Frames    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    ▼                          ▼                          ▼
            ┌─────────────┐            ┌─────────────┐            ┌─────────────┐
            │VisualEncoder│            │EventDetector│            │AudioExtract │
            │  (DINOv2)   │            │             │            │  (FFmpeg)   │
            └─────────────┘            └─────────────┘            └─────────────┘
                    │                          │                          │
                    ▼                          ▼                          ▼
            ┌─────────────┐            ┌─────────────┐            ┌─────────────┐
            │  Embeddings │            │   Events    │            │ Transcriber │
            │  (768-dim)  │            │             │            │  (Whisper)  │
            └─────────────┘            └─────────────┘            └─────────────┘
                    │                          │                          │
                    └──────────────────────────┼──────────────────────────┘
                                               ▼
                                    ┌─────────────────┐
                                    │ MultimodalIndex │
                                    │     (FAISS)     │
                                    └─────────────────┘
                                               │
                                               ▼
                                    ┌─────────────────┐
                                    │  Search/Query   │
                                    └─────────────────┘

Performance

Operation Target Actual
Query latency (1k vectors) <100ms 30.6µs
Search latency (100k vectors) <100ms 106.4µs
Frame embedding (placeholder) <50ms 0.36ms
Event detection <10ms 0.24ms

See BENCHMARKS.md for detailed performance analysis.

Requirements

  • Python 3.10+
  • NumPy >= 1.24.0
  • OpenCV >= 4.8.0

Optional Dependencies

  • ML: PyTorch >= 2.0, transformers, sentence-transformers, FAISS
  • Audio: faster-whisper >= 1.0.0
  • UI (v0.3.0): Gradio >= 4.0.0

Development

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src/vl_jepa --cov-report=term

# Lint and format
ruff check src/ && ruff format src/

# Type check
mypy src/ --strict

# Run benchmarks
pytest tests/benchmarks/ -v --benchmark-only

Roadmap

  • v0.1.0: Foundation (placeholder encoders, basic pipeline)
  • v0.2.0: Real Models + Audio (DINOv2, Whisper, multimodal search)
  • v0.3.0: User Experience (Gradio web UI, Docker)
  • v1.0.0: Production (optimization, real decoder, deployment)

License

MIT License - see LICENSE for details.

Citation

If you use Lecture Mind in your research, please cite:

@software{lecture_mind,
  title = {Lecture Mind: Event-aware Lecture Summarizer},
  author = {Matteo Panzeri},
  year = {2026},
  url = {https://github.com/matte1782/lecture-mind}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lecture_mind-0.2.0.tar.gz (201.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lecture_mind-0.2.0-py3-none-any.whl (45.1 kB view details)

Uploaded Python 3

File details

Details for the file lecture_mind-0.2.0.tar.gz.

File metadata

  • Download URL: lecture_mind-0.2.0.tar.gz
  • Upload date:
  • Size: 201.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for lecture_mind-0.2.0.tar.gz
Algorithm Hash digest
SHA256 451cd7def01cd83d70d35ba8d239aac03f3621403d24ae5eda5f6e49b68e46b3
MD5 991d023f4ed8197baa2f2cb9071126df
BLAKE2b-256 71de619e0e8dd1831ff236e745c313403bf41ebf69ae80d820659e54601ad17b

See more details on using hashes here.

File details

Details for the file lecture_mind-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lecture_mind-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for lecture_mind-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 405dc340634481b15f9d13a300b010e685e44832ff29fb5263ee8ccf8082e2aa
MD5 4d39d5e166a8b8e5b5df0ac5f7f7f0f2
BLAKE2b-256 7a4c99f41305e2b673aabfcbc941dca2d83702720f9dc0969defc6bc2c40fdb4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page