Skip to main content

AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG

Project description

🎬 FrameWise

AI-powered video search and Q&A for tutorial content

Python 3.9+ License: MIT

Transform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.

Quick Start

pip install framewise

# Or install with LLM features
pip install framewise[llm]

Basic Usage

from framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore

# 1. Process video
transcript = TranscriptExtractor().extract("tutorial.mp4")
frames = FrameExtractor().extract("tutorial.mp4", transcript)

# 2. Create searchable index
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)

store = FrameWiseVectorStore()
store.create_table(embeddings)

# 3. Search
results = store.search_by_text("How do I export?", embedder, limit=3)
for r in results:
    print(f"{r['timestamp']}s: {r['text']}")

With LLM Q&A (Optional)

from framewise import FrameWiseQA

# Requires: pip install framewise[llm]
# Set ANTHROPIC_API_KEY environment variable

qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")
print(response['answer'])  # Natural language answer with frame references

Features

  • 🎙️ Transcript Extraction - Whisper-powered speech-to-text
  • 🖼️ Smart Frame Extraction - Captures key visual moments
  • 🧠 Multimodal Embeddings - CLIP + Sentence Transformers
  • 🔍 Semantic Search - Find by meaning, not keywords
  • 🤖 LLM Q&A - Optional Claude integration (requires API key)

How It Works

Video → Transcript (Whisper) → Frames (OpenCV) → Embeddings (CLIP) → Search (LanceDB) → [Optional] Q&A (Claude)

Core Pipeline (no API keys needed):

  1. Extract audio transcript with timestamps
  2. Capture key frames at important moments
  3. Generate multimodal embeddings
  4. Search by semantic similarity

Optional LLM Layer:

  • Add natural language Q&A with Claude
  • Requires pip install framewise[llm] and API key

Installation

Requirements

  • Python 3.9+
  • ffmpeg (for video processing)
# macOS
brew install ffmpeg

# Ubuntu/Debian
apt-get install ffmpeg

Install FrameWise

# Core features only
pip install framewise

# With LLM Q&A support
pip install framewise[llm]

# From source
git clone https://github.com/mesmaeili73/framewise.git
cd framewise
pip install -e .

Configuration

Frame Extraction

extractor = FrameExtractor(
    strategy="hybrid",        # "scene", "transcript", or "hybrid"
    max_frames_per_video=20,
    scene_threshold=0.3,
    quality_threshold=0.5
)

Embeddings

embedder = FrameWiseEmbedder(
    text_model="all-MiniLM-L6-v2",
    vision_model="openai/clip-vit-base-patch32",
    device="cuda"  # or "cpu"
)

LLM Q&A (Optional)

# Set environment variable
export ANTHROPIC_API_KEY=your_key_here

# Or pass directly
qa = FrameWiseQA(
    vector_store=store,
    embedder=embedder,
    model="claude-3-5-sonnet-20241022",
    api_key="your_key_here"
)

Current Limitations (V1)

  • Vector Store: LanceDB only (local storage)
  • LLM Provider: Claude/Anthropic only
  • Embedding Models: Fixed (CLIP + Sentence Transformers)

Future versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.

Examples

See the examples/ directory for complete examples:

  • extract_transcript.py - Basic transcript extraction
  • extract_frames.py - Frame extraction strategies
  • complete_pipeline.py - Full end-to-end workflow

Use Cases

  • Product Teams: Build AI assistants for tutorial libraries
  • EdTech: Make educational videos searchable
  • Documentation: Create interactive video knowledge bases

Performance

For 50 videos (5 min each):

  • Processing: ~15-90 min (GPU vs CPU)
  • Search: <50ms per query
  • Q&A: ~2-3 seconds (with LLM)

Contributing

Contributions welcome! This is an open-source project.

License

MIT License - see LICENSE file

Built With


FrameWise: See the right frame at the right time 🎬

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

framewise-0.1.3.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

framewise-0.1.3-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file framewise-0.1.3.tar.gz.

File metadata

  • Download URL: framewise-0.1.3.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for framewise-0.1.3.tar.gz
Algorithm Hash digest
SHA256 921a3baed032b6b07c27f44d06219cb9cebe6bd2a1e49c4cd412ebfe9d77d120
MD5 04500658e9aea1de1f0a15502014c77d
BLAKE2b-256 9374f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690

See more details on using hashes here.

File details

Details for the file framewise-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: framewise-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for framewise-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4cf831f3d01ee91bbad95b33e8bc459f140c55305018dc87901705405297a2c2
MD5 90e1edf6285c67bd136fab1ce975ff47
BLAKE2b-256 239a205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page