Skip to main content

Extract frames and transcripts from video files for LLM context and multimodal pipelines.

Project description

clip2context

Extract frames and transcripts from video files — structured output ready for LLM context, multimodal pipelines, or archival.

Given one or more video files, clip2context produces:

  • Frames — high-quality WebP images at a configurable frame rate, plus a JSON manifest mapping each frame to its timestamp.
  • Transcript — plain text, timestamped JSON segments, and a human-readable timed text file, generated by OpenAI Whisper.

Requirements

Install FFmpeg via your package manager:

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (winget)
winget install ffmpeg

Installation

pip install clip2context

Or from source with uv:

git clone <repo-url>
cd clip2context
uv sync

Usage

Command line

python main.py <video_path> [<video_path> ...] [options]

Arguments

Argument Description
video_paths One or more video files or directories containing videos.
--output-dir DIR Base directory for all output (default: output/).
--fps FLOAT Frames per second to extract (default: 1.0). Use 0.5 for one frame every two seconds.
--quality 1-100 WebP compression quality (default: 95). Lower = smaller files.
--only-frames Extract frames only; skip transcription.
--only-transcripts Extract transcripts only; skip frame extraction.

Examples

# Process a single video with defaults (1 fps, quality 95)
python main.py interview.mp4

# Process all videos in a folder, 1 frame every 2 seconds
python main.py ./recordings/ --fps 0.5

# Transcripts only, custom output directory
python main.py lecture.mp4 --only-transcripts --output-dir ./results

# Frames only, lower quality for smaller file sizes
python main.py demo.mov --only-frames --fps 2 --quality 75

Python API

from main import run

# Full extraction (frames + transcript)
run("interview.mp4")

# Custom options
run(
    "lecture.mp4",
    output_base="results/",
    fps=0.5,
    quality=80,
    do_frames=True,
    do_transcript=True,
)

You can also use the individual extractors directly:

from extract_frames import extract_frames
from extract_transcript import extract_transcript

# Extract frames → returns (output_dir, frame_count)
output_dir, count = extract_frames("video.mp4", "output/frames", fps=1.0, quality=95)

# Transcribe audio → returns output_dir
output_dir = extract_transcript("video.mp4", "output/transcript")

Output layout

Each video produces output under <output_dir>/<video_stem>/:

output/
└── interview/
    ├── frames/
    │   ├── frame_0001.webp
    │   ├── frame_0002.webp
    │   ├── …
    │   └── frames_manifest.json
    └── transcript/
        ├── transcript_raw.txt
        ├── transcript_timestamped.json
        └── transcript_timed.txt

frames_manifest.json

Maps each frame file to its timestamp:

[
  {
    "frame_filename": "frame_0001.webp",
    "timestamp_seconds": 0.0,
    "timestamp_formatted": "00:00:00"
  },
  ...
]

transcript_timestamped.json

Word-accurate segment boundaries from Whisper:

[
  {
    "start": 0.0,
    "end": 4.28,
    "text": "Welcome to today's session."
  },
  ...
]

transcript_timed.txt

Human-readable transcript with timestamps:

[00:00:00] Welcome to today's session.
[00:00:04] Let's get started.

Supported formats

.mp4 .mov .avi .mkv .webm .m4v .flv .wmv

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip2context-0.1.1.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clip2context-0.1.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file clip2context-0.1.1.tar.gz.

File metadata

  • Download URL: clip2context-0.1.1.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clip2context-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d25bf0463a7420f5b430a686768f05f0e41c9ed7e46c7b7dcf0702a26a0770dc
MD5 417f2586b5a3e466c38c19e071d210dc
BLAKE2b-256 cef6496780acdc6b35107060fb0ef65914602db8701b971fa0845cb90ce58a09

See more details on using hashes here.

File details

Details for the file clip2context-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: clip2context-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clip2context-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf5c2a292632a56dd361c10bf7f30c7c88e86525bfb1b8fb480e813bff082673
MD5 3e3becd38e398be0f6af55454c467a74
BLAKE2b-256 1abddb2571b3bc76dca5b597aadccef8d23ddf8489bb83666a4621333f6a2003

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page