Skip to main content

AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG

Project description

๐ŸŽฌ FrameWise

Your AI guide that knows exactly which frame matters

Python 3.9+ License: MIT

What is FrameWise?

FrameWise is an intelligent virtual assistant that transforms tutorial videos into instant, visual support. Instead of watching entire videos, users get the exact screenshot and explanation they needโ€”right when they need it.

The Problem: You have 50+ tutorial videos (5 min each) showing users how to use your products. Users don't want to watch full videos to find one answer.

The Solution: FrameWise combines AI-powered video analysis, multimodal embeddings, and LLM integration to provide instant, accurate answers with visual proof.

โœจ Features

  • ๐ŸŽ™๏ธ Transcript Extraction - Automatic speech-to-text with Whisper
  • ๐Ÿ–ผ๏ธ Smart Frame Extraction - Intelligent keyframe selection at important moments
  • ๐Ÿง  Multimodal Embeddings - CLIP for images + sentence-transformers for text
  • ๐Ÿ” Semantic Search - Find relevant content by meaning, not just keywords
  • ๐Ÿค– LLM Q&A - Natural language answers with Claude (via LangChain)
  • ๐Ÿ”ง Transcript Correction - Fix common speech recognition errors
  • โšก GPU Acceleration - Fast processing with CUDA support
  • ๐Ÿ“ฆ Batch Processing - Handle multiple videos efficiently

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/framewise.git
cd framewise

# Install dependencies
pip install -e .

# Or with Poetry
poetry install
poetry shell

Setup

  1. Install ffmpeg (required for video processing):
brew install ffmpeg  # macOS
# or
apt-get install ffmpeg  # Linux
  1. Configure API keys (for LLM features):
cp .env.example .env
# Edit .env and add your Claude API key

Basic Usage

from framewise import (
    TranscriptExtractor,
    FrameExtractor,
    FrameWiseEmbedder,
    FrameWiseVectorStore,
    FrameWiseQA,
)

# 1. Extract transcript
transcript_ext = TranscriptExtractor()
transcript = transcript_ext.extract("tutorial.mp4")

# 2. Extract keyframes
frame_ext = FrameExtractor(strategy="hybrid")
frames = frame_ext.extract("tutorial.mp4", transcript=transcript)

# 3. Generate embeddings
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)

# 4. Store in vector database
store = FrameWiseVectorStore()
store.create_table(embeddings)

# 5. Ask questions!
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I export my data?")

print(response['answer'])
# "To export your data, click the Export button in the top-right corner..."

print(f"See frame at {response['relevant_frames'][0]['timestamp']}s")
# Direct link to the relevant moment

๐Ÿ“– How It Works

The Pipeline

Video Input
    โ†“
1. Transcript Extraction (Whisper)
    โ†“
2. Frame Extraction (OpenCV + Smart Alignment)
    โ†“
3. Multimodal Embeddings (CLIP + Sentence Transformers)
    โ†“
4. Vector Database (LanceDB)
    โ†“
5. Semantic Search + LLM Q&A (Claude)
    โ†“
Natural Language Answer + Visual Proof

Intelligent Frame Extraction

FrameWise uses three strategies to extract the most relevant frames:

  • Scene Detection: Captures visual transitions and UI changes
  • Transcript Alignment: Extracts frames when action keywords are mentioned ("click", "select", "open")
  • Hybrid (Recommended): Combines both for optimal coverage

Multimodal Search

Unlike traditional search that only matches keywords, FrameWise understands meaning:

  • Query: "export data" โ†’ Finds: "Click the export button" โœ…
  • Query: "save file" โ†’ Finds: "Click save icon" โœ…
  • Query: "settings" โ†’ Finds frames showing settings UI โœ…

๐ŸŽฏ Use Cases

For Product Teams

  • Build an AI assistant for your tutorial library
  • Reduce support tickets with instant visual answers
  • Scale documentation without writing more docs

For EdTech Platforms

  • Make educational videos searchable
  • Provide instant answers to student questions
  • Improve learning outcomes with visual guidance

For Documentation Teams

  • Augment written docs with video content
  • Create interactive video knowledge bases
  • Enable natural language search across videos

๐Ÿ“Š Performance

For 50 tutorial videos (5 min each):

Task CPU GPU (Single) GPU (Multi)
Transcript Extraction ~50 min ~15 min ~8 min
Frame Extraction ~25 min ~10 min ~5 min
Embedding Generation ~15 min ~3 min ~1.5 min
Total Processing ~90 min ~28 min ~15 min

Search Speed: <50ms per query
Answer Generation: ~2-3 seconds (with Claude)

๐Ÿงช Examples

Example 1: Simple Search

from framewise import FrameWiseVectorStore, FrameWiseEmbedder

store = FrameWiseVectorStore(db_path="tutorials.db")
embedder = FrameWiseEmbedder()

results = store.search_by_text(
    "How do I export?",
    embedder=embedder,
    limit=3
)

for result in results:
    print(f"{result['timestamp']}s: {result['text']}")

Example 2: Q&A with Claude

from framewise import FrameWiseQA

qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")

print(response['answer'])
# Natural language answer from Claude

for frame in response['relevant_frames']:
    print(f"See: {frame['frame_path']} at {frame['timestamp']}s")

Example 3: Batch Processing

from framewise import TranscriptExtractor, FrameExtractor

# Process multiple videos
videos = ["tutorial1.mp4", "tutorial2.mp4", "tutorial3.mp4"]

transcript_ext = TranscriptExtractor()
transcripts = transcript_ext.extract_batch(videos, output_dir="transcripts/")

# Extract frames from all
frame_ext = FrameExtractor(strategy="hybrid")
for video, transcript in zip(videos, transcripts):
    frames = frame_ext.extract(video, transcript=transcript)

Example 4: Transcript Correction

from framewise.utils.transcript_corrections import TranscriptCorrector

# Fix common transcription errors
corrector = TranscriptCorrector({
    "Defali": "Definely",
    "expot": "export",
})

corrected_transcript = corrector.correct_transcript(transcript)

๐Ÿ› ๏ธ Configuration

Frame Extraction Settings

frame_extractor = FrameExtractor(
    strategy="hybrid",           # "scene", "transcript", or "hybrid"
    max_frames_per_video=20,     # Limit number of frames
    scene_threshold=0.3,         # Scene change sensitivity (0-1)
    quality_threshold=0.5        # Minimum frame quality (0-1)
)

Embedding Models

embedder = FrameWiseEmbedder(
    text_model="all-MiniLM-L6-v2",              # Fast, good quality
    vision_model="openai/clip-vit-base-patch32", # Balanced
    device="cuda"                                # Use GPU
)

LLM Configuration

qa = FrameWiseQA(
    vector_store=store,
    embedder=embedder,
    model="claude-3-5-sonnet-20241022",  # Claude model
    max_tokens=1024,                      # Response length
    temperature=0.7                       # Creativity (0-1)
)

๐Ÿ“ Project Structure

framewise/
โ”œโ”€โ”€ framewise/              # Main package
โ”‚   โ”œโ”€โ”€ core/              # Video processing
โ”‚   โ”‚   โ”œโ”€โ”€ transcript_extractor.py
โ”‚   โ”‚   โ””โ”€โ”€ frame_extractor.py
โ”‚   โ”œโ”€โ”€ embeddings/        # Embedding generation
โ”‚   โ”‚   โ””โ”€โ”€ embedder.py
โ”‚   โ”œโ”€โ”€ retrieval/         # Vector search & Q&A
โ”‚   โ”‚   โ”œโ”€โ”€ vector_store.py
โ”‚   โ”‚   โ””โ”€โ”€ qa_system.py
โ”‚   โ””โ”€โ”€ utils/             # Utilities
โ”‚       โ””โ”€โ”€ transcript_corrections.py
โ”œโ”€โ”€ examples/              # Usage examples
โ”œโ”€โ”€ tests/                 # Test suite
โ””โ”€โ”€ docs/                  # Documentation

๐Ÿงช Testing

# Run unit tests
pytest tests/ -v

# Test transcript extraction
python test_it_yourself.py

# Test vector search
python test_search.py

# Test Q&A with Claude
python test_qa.py

# Test with Tableau video
python test_tableau.py

๐Ÿค Contributing

Contributions are welcome! This is an open-source project.

๐Ÿ“„ License

MIT License - see LICENSE file for details

๐Ÿ™ Acknowledgments

Built with:

๐ŸŽฏ Why "FrameWise"?

Because wisdom isn't about watching everythingโ€”it's about seeing the right frame at the right time.


Built for product teams who want their users to succeed, one frame at a time. ๐ŸŽฌโœจ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framewise-0.1.2.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

framewise-0.1.2-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file framewise-0.1.2.tar.gz.

File metadata

  • Download URL: framewise-0.1.2.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for framewise-0.1.2.tar.gz
Algorithm Hash digest
SHA256 68eb7fc80d4ad74c2f2ce6b36a153e4d3afc17e3b997bfefed4f65d63a08c39f
MD5 00a4569a8e292e80be8261e1e5ea7819
BLAKE2b-256 47f1c0e7f9f4e521b0b203307a38e4790fb3d96bd7891749869d254c1a75ff12

See more details on using hashes here.

File details

Details for the file framewise-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: framewise-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for framewise-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e2e93886cb7880f1bd25d6137e39e327fe95c2c474aa7ae34fc5109a86f5bb7
MD5 b9b099deb11e714827d805fdaef91ebe
BLAKE2b-256 f853e382969a4ae7583806b6baa54332bf2f3dc34d33fb685c5ff7bd77aae5c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page