AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG
Project description
๐ฌ FrameWise
Your AI guide that knows exactly which frame matters
What is FrameWise?
FrameWise is an intelligent virtual assistant that transforms tutorial videos into instant, visual support. Instead of watching entire videos, users get the exact screenshot and explanation they needโright when they need it.
The Problem: You have 50+ tutorial videos (5 min each) showing users how to use your products. Users don't want to watch full videos to find one answer.
The Solution: FrameWise combines AI-powered video analysis, multimodal embeddings, and LLM integration to provide instant, accurate answers with visual proof.
โจ Features
- ๐๏ธ Transcript Extraction - Automatic speech-to-text with Whisper
- ๐ผ๏ธ Smart Frame Extraction - Intelligent keyframe selection at important moments
- ๐ง Multimodal Embeddings - CLIP for images + sentence-transformers for text
- ๐ Semantic Search - Find relevant content by meaning, not just keywords
- ๐ค LLM Q&A - Natural language answers with Claude (via LangChain)
- ๐ง Transcript Correction - Fix common speech recognition errors
- โก GPU Acceleration - Fast processing with CUDA support
- ๐ฆ Batch Processing - Handle multiple videos efficiently
๐ Quick Start
Installation
# Clone the repository
git clone https://github.com/yourusername/framewise.git
cd framewise
# Install dependencies
pip install -e .
# Or with Poetry
poetry install
poetry shell
Setup
- Install ffmpeg (required for video processing):
brew install ffmpeg # macOS
# or
apt-get install ffmpeg # Linux
- Configure API keys (for LLM features):
cp .env.example .env
# Edit .env and add your Claude API key
Basic Usage
from framewise import (
TranscriptExtractor,
FrameExtractor,
FrameWiseEmbedder,
FrameWiseVectorStore,
FrameWiseQA,
)
# 1. Extract transcript
transcript_ext = TranscriptExtractor()
transcript = transcript_ext.extract("tutorial.mp4")
# 2. Extract keyframes
frame_ext = FrameExtractor(strategy="hybrid")
frames = frame_ext.extract("tutorial.mp4", transcript=transcript)
# 3. Generate embeddings
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)
# 4. Store in vector database
store = FrameWiseVectorStore()
store.create_table(embeddings)
# 5. Ask questions!
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I export my data?")
print(response['answer'])
# "To export your data, click the Export button in the top-right corner..."
print(f"See frame at {response['relevant_frames'][0]['timestamp']}s")
# Direct link to the relevant moment
๐ How It Works
The Pipeline
Video Input
โ
1. Transcript Extraction (Whisper)
โ
2. Frame Extraction (OpenCV + Smart Alignment)
โ
3. Multimodal Embeddings (CLIP + Sentence Transformers)
โ
4. Vector Database (LanceDB)
โ
5. Semantic Search + LLM Q&A (Claude)
โ
Natural Language Answer + Visual Proof
Intelligent Frame Extraction
FrameWise uses three strategies to extract the most relevant frames:
- Scene Detection: Captures visual transitions and UI changes
- Transcript Alignment: Extracts frames when action keywords are mentioned ("click", "select", "open")
- Hybrid (Recommended): Combines both for optimal coverage
Multimodal Search
Unlike traditional search that only matches keywords, FrameWise understands meaning:
- Query: "export data" โ Finds: "Click the export button" โ
- Query: "save file" โ Finds: "Click save icon" โ
- Query: "settings" โ Finds frames showing settings UI โ
๐ฏ Use Cases
For Product Teams
- Build an AI assistant for your tutorial library
- Reduce support tickets with instant visual answers
- Scale documentation without writing more docs
For EdTech Platforms
- Make educational videos searchable
- Provide instant answers to student questions
- Improve learning outcomes with visual guidance
For Documentation Teams
- Augment written docs with video content
- Create interactive video knowledge bases
- Enable natural language search across videos
๐ Performance
For 50 tutorial videos (5 min each):
| Task | CPU | GPU (Single) | GPU (Multi) |
|---|---|---|---|
| Transcript Extraction | ~50 min | ~15 min | ~8 min |
| Frame Extraction | ~25 min | ~10 min | ~5 min |
| Embedding Generation | ~15 min | ~3 min | ~1.5 min |
| Total Processing | ~90 min | ~28 min | ~15 min |
Search Speed: <50ms per query
Answer Generation: ~2-3 seconds (with Claude)
๐งช Examples
Example 1: Simple Search
from framewise import FrameWiseVectorStore, FrameWiseEmbedder
store = FrameWiseVectorStore(db_path="tutorials.db")
embedder = FrameWiseEmbedder()
results = store.search_by_text(
"How do I export?",
embedder=embedder,
limit=3
)
for result in results:
print(f"{result['timestamp']}s: {result['text']}")
Example 2: Q&A with Claude
from framewise import FrameWiseQA
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")
print(response['answer'])
# Natural language answer from Claude
for frame in response['relevant_frames']:
print(f"See: {frame['frame_path']} at {frame['timestamp']}s")
Example 3: Batch Processing
from framewise import TranscriptExtractor, FrameExtractor
# Process multiple videos
videos = ["tutorial1.mp4", "tutorial2.mp4", "tutorial3.mp4"]
transcript_ext = TranscriptExtractor()
transcripts = transcript_ext.extract_batch(videos, output_dir="transcripts/")
# Extract frames from all
frame_ext = FrameExtractor(strategy="hybrid")
for video, transcript in zip(videos, transcripts):
frames = frame_ext.extract(video, transcript=transcript)
Example 4: Transcript Correction
from framewise.utils.transcript_corrections import TranscriptCorrector
# Fix common transcription errors
corrector = TranscriptCorrector({
"Defali": "Definely",
"expot": "export",
})
corrected_transcript = corrector.correct_transcript(transcript)
๐ ๏ธ Configuration
Frame Extraction Settings
frame_extractor = FrameExtractor(
strategy="hybrid", # "scene", "transcript", or "hybrid"
max_frames_per_video=20, # Limit number of frames
scene_threshold=0.3, # Scene change sensitivity (0-1)
quality_threshold=0.5 # Minimum frame quality (0-1)
)
Embedding Models
embedder = FrameWiseEmbedder(
text_model="all-MiniLM-L6-v2", # Fast, good quality
vision_model="openai/clip-vit-base-patch32", # Balanced
device="cuda" # Use GPU
)
LLM Configuration
qa = FrameWiseQA(
vector_store=store,
embedder=embedder,
model="claude-3-5-sonnet-20241022", # Claude model
max_tokens=1024, # Response length
temperature=0.7 # Creativity (0-1)
)
๐ Project Structure
framewise/
โโโ framewise/ # Main package
โ โโโ core/ # Video processing
โ โ โโโ transcript_extractor.py
โ โ โโโ frame_extractor.py
โ โโโ embeddings/ # Embedding generation
โ โ โโโ embedder.py
โ โโโ retrieval/ # Vector search & Q&A
โ โ โโโ vector_store.py
โ โ โโโ qa_system.py
โ โโโ utils/ # Utilities
โ โโโ transcript_corrections.py
โโโ examples/ # Usage examples
โโโ tests/ # Test suite
โโโ docs/ # Documentation
๐งช Testing
# Run unit tests
pytest tests/ -v
# Test transcript extraction
python test_it_yourself.py
# Test vector search
python test_search.py
# Test Q&A with Claude
python test_qa.py
# Test with Tableau video
python test_tableau.py
๐ค Contributing
Contributions are welcome! This is an open-source project.
๐ License
MIT License - see LICENSE file for details
๐ Acknowledgments
Built with:
- OpenAI Whisper - Speech recognition
- CLIP - Vision-language model
- Sentence Transformers - Text embeddings
- LanceDB - Vector database
- LangChain - LLM orchestration
- Anthropic Claude - Language model
๐ฏ Why "FrameWise"?
Because wisdom isn't about watching everythingโit's about seeing the right frame at the right time.
Built for product teams who want their users to succeed, one frame at a time. ๐ฌโจ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file framewise-0.1.2.tar.gz.
File metadata
- Download URL: framewise-0.1.2.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68eb7fc80d4ad74c2f2ce6b36a153e4d3afc17e3b997bfefed4f65d63a08c39f
|
|
| MD5 |
00a4569a8e292e80be8261e1e5ea7819
|
|
| BLAKE2b-256 |
47f1c0e7f9f4e521b0b203307a38e4790fb3d96bd7891749869d254c1a75ff12
|
File details
Details for the file framewise-0.1.2-py3-none-any.whl.
File metadata
- Download URL: framewise-0.1.2-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e2e93886cb7880f1bd25d6137e39e327fe95c2c474aa7ae34fc5109a86f5bb7
|
|
| MD5 |
b9b099deb11e714827d805fdaef91ebe
|
|
| BLAKE2b-256 |
f853e382969a4ae7583806b6baa54332bf2f3dc34d33fb685c5ff7bd77aae5c7
|