AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG

These details have not been verified by PyPI

Project links

Project description

🎬 FrameWise

Your AI guide that knows exactly which frame matters

What is FrameWise?

FrameWise is an intelligent virtual assistant that transforms tutorial videos into instant, visual support. Instead of watching entire videos, users get the exact screenshot and explanation they need—right when they need it.

The Problem: You have 50+ tutorial videos (5 min each) showing users how to use your products. Users don't want to watch full videos to find one answer.

The Solution: FrameWise combines AI-powered video analysis, multimodal embeddings, and LLM integration to provide instant, accurate answers with visual proof.

✨ Features

🎙️ Transcript Extraction - Automatic speech-to-text with Whisper
🖼️ Smart Frame Extraction - Intelligent keyframe selection at important moments
🧠 Multimodal Embeddings - CLIP for images + sentence-transformers for text
🔍 Semantic Search - Find relevant content by meaning, not just keywords
🤖 LLM Q&A - Natural language answers with Claude (via LangChain)
🔧 Transcript Correction - Fix common speech recognition errors
⚡ GPU Acceleration - Fast processing with CUDA support
📦 Batch Processing - Handle multiple videos efficiently

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/framewise.git
cd framewise

# Install dependencies
pip install -e .

# Or with Poetry
poetry install
poetry shell

Setup

Install ffmpeg (required for video processing):

brew install ffmpeg  # macOS
# or
apt-get install ffmpeg  # Linux

Configure API keys (for LLM features):

cp .env.example .env
# Edit .env and add your Claude API key

Basic Usage

from framewise import (
    TranscriptExtractor,
    FrameExtractor,
    FrameWiseEmbedder,
    FrameWiseVectorStore,
    FrameWiseQA,
)

# 1. Extract transcript
transcript_ext = TranscriptExtractor()
transcript = transcript_ext.extract("tutorial.mp4")

# 2. Extract keyframes
frame_ext = FrameExtractor(strategy="hybrid")
frames = frame_ext.extract("tutorial.mp4", transcript=transcript)

# 3. Generate embeddings
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)

# 4. Store in vector database
store = FrameWiseVectorStore()
store.create_table(embeddings)

# 5. Ask questions!
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I export my data?")

print(response['answer'])
# "To export your data, click the Export button in the top-right corner..."

print(f"See frame at {response['relevant_frames'][0]['timestamp']}s")
# Direct link to the relevant moment

📖 How It Works

The Pipeline

Video Input
    ↓
1. Transcript Extraction (Whisper)
    ↓
2. Frame Extraction (OpenCV + Smart Alignment)
    ↓
3. Multimodal Embeddings (CLIP + Sentence Transformers)
    ↓
4. Vector Database (LanceDB)
    ↓
5. Semantic Search + LLM Q&A (Claude)
    ↓
Natural Language Answer + Visual Proof

Intelligent Frame Extraction

FrameWise uses three strategies to extract the most relevant frames:

Scene Detection: Captures visual transitions and UI changes
Transcript Alignment: Extracts frames when action keywords are mentioned ("click", "select", "open")
Hybrid (Recommended): Combines both for optimal coverage

Multimodal Search

Unlike traditional search that only matches keywords, FrameWise understands meaning:

Query: "export data" → Finds: "Click the export button" ✅
Query: "save file" → Finds: "Click save icon" ✅
Query: "settings" → Finds frames showing settings UI ✅

🎯 Use Cases

For Product Teams

Build an AI assistant for your tutorial library
Reduce support tickets with instant visual answers
Scale documentation without writing more docs

For EdTech Platforms

Make educational videos searchable
Provide instant answers to student questions
Improve learning outcomes with visual guidance

For Documentation Teams

Augment written docs with video content
Create interactive video knowledge bases
Enable natural language search across videos

📊 Performance

For 50 tutorial videos (5 min each):

Task	CPU	GPU (Single)	GPU (Multi)
Transcript Extraction	~50 min	~15 min	~8 min
Frame Extraction	~25 min	~10 min	~5 min
Embedding Generation	~15 min	~3 min	~1.5 min
Total Processing	~90 min	~28 min	~15 min

Search Speed: <50ms per query
Answer Generation: ~2-3 seconds (with Claude)

🧪 Examples

Example 1: Simple Search

from framewise import FrameWiseVectorStore, FrameWiseEmbedder

store = FrameWiseVectorStore(db_path="tutorials.db")
embedder = FrameWiseEmbedder()

results = store.search_by_text(
    "How do I export?",
    embedder=embedder,
    limit=3
)

for result in results:
    print(f"{result['timestamp']}s: {result['text']}")

Example 2: Q&A with Claude

from framewise import FrameWiseQA

qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")

print(response['answer'])
# Natural language answer from Claude

for frame in response['relevant_frames']:
    print(f"See: {frame['frame_path']} at {frame['timestamp']}s")

Example 3: Batch Processing

from framewise import TranscriptExtractor, FrameExtractor

# Process multiple videos
videos = ["tutorial1.mp4", "tutorial2.mp4", "tutorial3.mp4"]

transcript_ext = TranscriptExtractor()
transcripts = transcript_ext.extract_batch(videos, output_dir="transcripts/")

# Extract frames from all
frame_ext = FrameExtractor(strategy="hybrid")
for video, transcript in zip(videos, transcripts):
    frames = frame_ext.extract(video, transcript=transcript)

Example 4: Transcript Correction

from framewise.utils.transcript_corrections import TranscriptCorrector

# Fix common transcription errors
corrector = TranscriptCorrector({
    "Defali": "Definely",
    "expot": "export",
})

corrected_transcript = corrector.correct_transcript(transcript)

🛠️ Configuration

Frame Extraction Settings

frame_extractor = FrameExtractor(
    strategy="hybrid",           # "scene", "transcript", or "hybrid"
    max_frames_per_video=20,     # Limit number of frames
    scene_threshold=0.3,         # Scene change sensitivity (0-1)
    quality_threshold=0.5        # Minimum frame quality (0-1)
)

Embedding Models

embedder = FrameWiseEmbedder(
    text_model="all-MiniLM-L6-v2",              # Fast, good quality
    vision_model="openai/clip-vit-base-patch32", # Balanced
    device="cuda"                                # Use GPU
)

LLM Configuration

qa = FrameWiseQA(
    vector_store=store,
    embedder=embedder,
    model="claude-3-5-sonnet-20241022",  # Claude model
    max_tokens=1024,                      # Response length
    temperature=0.7                       # Creativity (0-1)
)

📁 Project Structure

framewise/
├── framewise/              # Main package
│   ├── core/              # Video processing
│   │   ├── transcript_extractor.py
│   │   └── frame_extractor.py
│   ├── embeddings/        # Embedding generation
│   │   └── embedder.py
│   ├── retrieval/         # Vector search & Q&A
│   │   ├── vector_store.py
│   │   └── qa_system.py
│   └── utils/             # Utilities
│       └── transcript_corrections.py
├── examples/              # Usage examples
├── tests/                 # Test suite
└── docs/                  # Documentation

🧪 Testing

# Run unit tests
pytest tests/ -v

# Test transcript extraction
python test_it_yourself.py

# Test vector search
python test_search.py

# Test Q&A with Claude
python test_qa.py

# Test with Tableau video
python test_tableau.py

🤝 Contributing

Contributions are welcome! This is an open-source project.

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built with:

OpenAI Whisper - Speech recognition
CLIP - Vision-language model
Sentence Transformers - Text embeddings
LanceDB - Vector database
LangChain - LLM orchestration
Anthropic Claude - Language model

🎯 Why "FrameWise"?

Because wisdom isn't about watching everything—it's about seeing the right frame at the right time.

Built for product teams who want their users to succeed, one frame at a time. 🎬✨

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Oct 19, 2025

This version

0.1.2

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framewise-0.1.2.tar.gz (21.0 kB view details)

Uploaded Oct 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

framewise-0.1.2-py3-none-any.whl (22.8 kB view details)

Uploaded Oct 18, 2025 Python 3

File details

Details for the file framewise-0.1.2.tar.gz.

File metadata

Download URL: framewise-0.1.2.tar.gz
Upload date: Oct 18, 2025
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for framewise-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`68eb7fc80d4ad74c2f2ce6b36a153e4d3afc17e3b997bfefed4f65d63a08c39f`
MD5	`00a4569a8e292e80be8261e1e5ea7819`
BLAKE2b-256	`47f1c0e7f9f4e521b0b203307a38e4790fb3d96bd7891749869d254c1a75ff12`

See more details on using hashes here.

File details

Details for the file framewise-0.1.2-py3-none-any.whl.

File metadata

Download URL: framewise-0.1.2-py3-none-any.whl
Upload date: Oct 18, 2025
Size: 22.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for framewise-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e2e93886cb7880f1bd25d6137e39e327fe95c2c474aa7ae34fc5109a86f5bb7`
MD5	`b9b099deb11e714827d805fdaef91ebe`
BLAKE2b-256	`f853e382969a4ae7583806b6baa54332bf2f3dc34d33fb685c5ff7bd77aae5c7`

See more details on using hashes here.

framewise 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎬 FrameWise

What is FrameWise?

✨ Features

🚀 Quick Start

Installation

Setup

Basic Usage

📖 How It Works

The Pipeline

Intelligent Frame Extraction

Multimodal Search

🎯 Use Cases

For Product Teams

For EdTech Platforms

For Documentation Teams

📊 Performance

🧪 Examples

Example 1: Simple Search

Example 2: Q&A with Claude

Example 3: Batch Processing

Example 4: Transcript Correction

🛠️ Configuration

Frame Extraction Settings

Embedding Models

LLM Configuration

📁 Project Structure

🧪 Testing

🤝 Contributing

📄 License

🙏 Acknowledgments

🎯 Why "FrameWise"?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes