Convert video to AI-ingestable format (frames + transcript + LLM analysis)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

video2ai

Turn any video into AI-ready structured content. Extract frames, transcribe audio, auto-detect key moments — all running locally on your Mac's Neural Engine.

No cloud. No API keys. No PyTorch. Just Apple Silicon doing what it does best.

Installation

Prerequisites

macOS (Apple Vision framework required)
ffmpeg — brew install ffmpeg

Option 1: pip (recommended)

pip install video2ai

With Apple Vision support (OCR, embeddings, classification):

pip install "video2ai[vision]"

Option 2: Homebrew

brew tap sameeeeeeep/video2ai https://github.com/sameeeeeeep/video2ai.git
brew install video2ai

Option 3: Download binary

Grab the latest pre-built macOS binary from Releases — no Python required:

curl -L https://github.com/sameeeeeeep/video2ai/releases/latest/download/video2ai -o video2ai
chmod +x video2ai
sudo mv video2ai /usr/local/bin/

Option 4: Install from source

git clone https://github.com/sameeeeeeep/video2ai.git && cd video2ai
pip install -e ".[vision]"

Optional extras

pip install openai-whisper    # transcription (local, base model is fine)
brew install yt-dlp           # URL downloads (YouTube, Vimeo, etc.)

The Problem

You have a video. You need an AI to understand it. But LLMs can't watch videos — they need frames + text. Manually scrubbing through to pick the right frames is tedious. Existing tools are slow, memory-hungry, or require cloud APIs.

The Solution

Video → ffmpeg + Whisper + Apple Vision → structured content in seconds

Drop a video in. Get back:

Timestamped transcript — Whisper, fully local
Key frames auto-selected per transcript segment — Apple Vision Neural Engine embeddings + cosine similarity
Visual theme clusters — k-means on frame embeddings, filter out talking heads, keep product shots
Lightweight Markdown export — local image paths, no base64 bloat, AI reads text instantly and loads images on demand
Self-contained HTML export — images embedded inline, for human viewing
Screen capture — record any tab/screen directly from the browser, bypasses all platform download restrictions

Quick Start

Web UI

video2ai --web
# → http://localhost:8910

Three input modes:

Upload — drag a video file
Paste URL — YouTube, Threads, Vimeo, anything yt-dlp supports
Screen Capture — share any browser tab or screen, record at 1fps + audio, process through the same pipeline. Works with Instagram, TikTok, Netflix — anything on screen.

CLI

video2ai video.mp4 -o output/

Claude Code Skill

# Invoke from Claude Code:
/video2ai /path/to/video.mp4

The skill runs the full pipeline and outputs a lightweight Markdown file that Claude can read with local image paths.

How It Works

Video file / URL / Screen capture
  │
  ├─ ffmpeg ──────────── frames (1/sec, JPEG)
  │
  ├─ Whisper ─────────── transcript segments + timestamps
  │
  ├─ Apple Vision ────── 768-dim embedding per frame (Neural Engine)
  │    │
  │    ├─ per-segment ── cosine distance → visual state changes → key frame suggestions
  │    │
  │    └─ global ─────── k-means clustering → visual theme groups
  │
  ├─ Apple Vision OCR ── optional, on-demand text extraction from key frames
  │
  └─ Apple Intelligence ── on-device OCR summary via FoundationModels (auto-launches server)

The key insight: frame selection is a vector math problem, not an LLM problem. Embed every frame, embed (or timestamp-match) every transcript segment, pick the frames with the highest visual distinctiveness per segment. Runs in seconds, not minutes.

Zero ML overhead in Python. VNGenerateImageFeaturePrintRequest runs on the Neural Engine — the Python process just shuffles bytes. No PyTorch, no CLIP, no transformers loaded into RAM.

The Workflow

Upload, paste URL, or screen capture — any input mode
Pipeline runs — probe → extract → transcribe → embed → suggest
Review — transcript sidebar, frame grid per segment, pre-selected key frames
Filter by visual theme — click to deselect/select all frames in a theme, right-click to suppress
OCR (optional) — run Apple Vision OCR on selected key frames, auto-summarized by Apple Intelligence on-device
Export — Markdown (for AI) or HTML (for humans). OCR summary included by default, raw OCR opt-in.

Export Formats

Format	Mode	Best for
Markdown	`Download for AI`	AI consumption — lightweight text + local image paths, ~150 lines vs 170k tokens
HTML	`Download HTML`	Human viewing — self-contained, base64 images, opens in any browser
HTML (AI)	`?mode=ai`	Compressed thumbnails, still self-contained

Architecture

Module	What it does
`probe.py`	ffprobe wrapper — duration, resolution, codecs, audio detection
`frames.py`	ffmpeg frame extraction at configurable intervals
`transcribe.py`	Whisper speech-to-text, returns timed segments
`clip_match.py`	Apple Vision embeddings, visual change detection, k-means clustering
`vision.py`	Apple Vision OCR + image classification + Apple Intelligence summarization
`llm.py`	Ollama LLM analysis — optional, for summaries
`web.py`	Flask web UI — upload, URL, screen capture, review, export
`embed.py`	Bake metadata into video via ffmpeg

Why Not Just Use CLIP?

We tried. CLIP + PyTorch eats ~2GB RAM and requires loading a 600MB model. Apple Vision's VNGenerateImageFeaturePrintRequest runs on the Neural Engine with near-zero memory overhead — it's already on your machine, already optimized, and produces 768-dim embeddings that work great for frame similarity.

For transcript↔frame matching, we don't even need cross-modal embeddings. The transcript gives us timestamps → we know which frames belong to which segment → we pick the most visually distinct ones within each segment. Simple, fast, accurate.

Contributing

git clone https://github.com/sameeeeeeep/video2ai.git && cd video2ai
make dev

Releasing

make release VERSION=0.2.0

This bumps the version, commits, tags, and pushes. GitHub Actions handles PyPI publishing, binary builds, and Homebrew formula updates automatically.

License

MIT

Built with Claude Code.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sameeeeeeep

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Mar 23, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video2ai-0.1.1.tar.gz (49.9 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

video2ai-0.1.1-py3-none-any.whl (52.0 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file video2ai-0.1.1.tar.gz.

File metadata

Download URL: video2ai-0.1.1.tar.gz
Upload date: Mar 23, 2026
Size: 49.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for video2ai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b0ce0eb8e5cc0f6c838a7f92e8dc999a9d5f48b6cf64ebe9d768ab281f3f5672`
MD5	`db101fd1d5913a78fc5574a962c5d85d`
BLAKE2b-256	`850711395e74716aa2967532a58083c211899d33f8f00ab6abfd2f30d80401fa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for video2ai-0.1.1.tar.gz:

Publisher: release.yml on sameeeeeeep/video2ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: video2ai-0.1.1.tar.gz
- Subject digest: b0ce0eb8e5cc0f6c838a7f92e8dc999a9d5f48b6cf64ebe9d768ab281f3f5672
- Sigstore transparency entry: 1166135484
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: sameeeeeeep/video2ai@f80a108ba6431190707acd125a9b2d3b068a8dee
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/sameeeeeeep
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f80a108ba6431190707acd125a9b2d3b068a8dee
- Trigger Event: push

File details

Details for the file video2ai-0.1.1-py3-none-any.whl.

File metadata

Download URL: video2ai-0.1.1-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 52.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for video2ai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14417fafcf95446b101214dfa0dead89e0b41385e5789f61749dd0efd20a1f3e`
MD5	`49130040fe9da498378f43aded502c0a`
BLAKE2b-256	`f2dbdf91efdd450ac3b594e1c86e61e55d8840eacf7e986b74e610b53960589a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for video2ai-0.1.1-py3-none-any.whl:

Publisher: release.yml on sameeeeeeep/video2ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: video2ai-0.1.1-py3-none-any.whl
- Subject digest: 14417fafcf95446b101214dfa0dead89e0b41385e5789f61749dd0efd20a1f3e
- Sigstore transparency entry: 1166135553
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: sameeeeeeep/video2ai@f80a108ba6431190707acd125a9b2d3b068a8dee
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/sameeeeeeep
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f80a108ba6431190707acd125a9b2d3b068a8dee
- Trigger Event: push

video2ai 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

video2ai

Installation

Prerequisites

Option 1: pip (recommended)

Option 2: Homebrew

Option 3: Download binary

Option 4: Install from source

Optional extras

The Problem

The Solution

Quick Start

Web UI

CLI

Claude Code Skill

How It Works

The Workflow

Export Formats

Architecture

Why Not Just Use CLIP?

Contributing

Releasing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance