AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG
Project description
🎬 FrameWise
AI-powered video search and Q&A for tutorial content
Transform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.
Quick Start
pip install framewise
# Or install with LLM features
pip install framewise[llm]
Basic Usage
from framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore
# 1. Process video
transcript = TranscriptExtractor().extract("tutorial.mp4")
frames = FrameExtractor().extract("tutorial.mp4", transcript)
# 2. Create searchable index
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)
store = FrameWiseVectorStore()
store.create_table(embeddings)
# 3. Search
results = store.search_by_text("How do I export?", embedder, limit=3)
for r in results:
print(f"{r['timestamp']}s: {r['text']}")
With LLM Q&A (Optional)
from framewise import FrameWiseQA
# Requires: pip install framewise[llm]
# Set ANTHROPIC_API_KEY environment variable
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")
print(response['answer']) # Natural language answer with frame references
Features
- 🎙️ Transcript Extraction - Whisper-powered speech-to-text
- 🖼️ Smart Frame Extraction - Captures key visual moments
- 🧠 Multimodal Embeddings - CLIP + Sentence Transformers
- 🔍 Semantic Search - Find by meaning, not keywords
- 🤖 LLM Q&A - Optional Claude integration (requires API key)
How It Works
Video → Transcript (Whisper) → Frames (OpenCV) → Embeddings (CLIP) → Search (LanceDB) → [Optional] Q&A (Claude)
Core Pipeline (no API keys needed):
- Extract audio transcript with timestamps
- Capture key frames at important moments
- Generate multimodal embeddings
- Search by semantic similarity
Optional LLM Layer:
- Add natural language Q&A with Claude
- Requires
pip install framewise[llm]and API key
Installation
Requirements
- Python 3.9+
- ffmpeg (for video processing)
# macOS
brew install ffmpeg
# Ubuntu/Debian
apt-get install ffmpeg
Install FrameWise
# Core features only
pip install framewise
# With LLM Q&A support
pip install framewise[llm]
# From source
git clone https://github.com/mesmaeili73/framewise.git
cd framewise
pip install -e .
Configuration
Frame Extraction
extractor = FrameExtractor(
strategy="hybrid", # "scene", "transcript", or "hybrid"
max_frames_per_video=20,
scene_threshold=0.3,
quality_threshold=0.5
)
Embeddings
embedder = FrameWiseEmbedder(
text_model="all-MiniLM-L6-v2",
vision_model="openai/clip-vit-base-patch32",
device="cuda" # or "cpu"
)
LLM Q&A (Optional)
# Set environment variable
export ANTHROPIC_API_KEY=your_key_here
# Or pass directly
qa = FrameWiseQA(
vector_store=store,
embedder=embedder,
model="claude-3-5-sonnet-20241022",
api_key="your_key_here"
)
Current Limitations (V1)
- Vector Store: LanceDB only (local storage)
- LLM Provider: Claude/Anthropic only
- Embedding Models: Fixed (CLIP + Sentence Transformers)
Future versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.
Examples
See the examples/ directory for complete examples:
extract_transcript.py- Basic transcript extractionextract_frames.py- Frame extraction strategiescomplete_pipeline.py- Full end-to-end workflow
Use Cases
- Product Teams: Build AI assistants for tutorial libraries
- EdTech: Make educational videos searchable
- Documentation: Create interactive video knowledge bases
Performance
For 50 videos (5 min each):
- Processing: ~15-90 min (GPU vs CPU)
- Search: <50ms per query
- Q&A: ~2-3 seconds (with LLM)
Contributing
Contributions welcome! This is an open-source project.
License
MIT License - see LICENSE file
Built With
- OpenAI Whisper - Speech recognition
- CLIP - Vision-language embeddings
- Sentence Transformers - Text embeddings
- LanceDB - Vector database
- LangChain - LLM orchestration
- Anthropic Claude - Language model
FrameWise: See the right frame at the right time 🎬
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file framewise-0.1.3.tar.gz.
File metadata
- Download URL: framewise-0.1.3.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
921a3baed032b6b07c27f44d06219cb9cebe6bd2a1e49c4cd412ebfe9d77d120
|
|
| MD5 |
04500658e9aea1de1f0a15502014c77d
|
|
| BLAKE2b-256 |
9374f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690
|
File details
Details for the file framewise-0.1.3-py3-none-any.whl.
File metadata
- Download URL: framewise-0.1.3-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cf831f3d01ee91bbad95b33e8bc459f140c55305018dc87901705405297a2c2
|
|
| MD5 |
90e1edf6285c67bd136fab1ce975ff47
|
|
| BLAKE2b-256 |
239a205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c
|