Skip to main content

Extract frames and transcripts from video files for LLM context and multimodal pipelines.

Project description

clip2context

Extract frames and transcripts from video files — structured output ready for LLM context, multimodal pipelines, or archival.

Given one or more video files, clip2context produces:

  • Frames — high-quality WebP images at a configurable frame rate, plus a JSON manifest mapping each frame to its timestamp.
  • Transcript — plain text, timestamped JSON segments, and a human-readable timed text file, generated by OpenAI Whisper.

Requirements

  • Python 3.12+
  • FFmpeg (must be on PATH)

Install FFmpeg via your package manager:

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows (winget)
winget install ffmpeg

Installation

pip install clip2context

Or from source with uv:

git clone <repo-url>
cd clip2context
uv sync

Usage

clip2context <video_path> [<video_path> ...] [options]

Arguments

Argument Description
video_paths One or more video files or directories containing videos.
--output-dir DIR Base directory for all output (default: output/).
--fps FLOAT Frames per second to extract (default: 1.0). Use 0.5 for one frame every two seconds.
--quality 1-100 WebP compression quality (default: 95). Lower = smaller files.
--start-time TIME Start time (format: SS, MM:SS, or HH:MM:SS). Process from this point onward.
--end-time TIME End time (format: SS, MM:SS, or HH:MM:SS). Process up to this point.
--only-frames Extract frames only; skip transcription.
--only-transcripts Extract transcripts only; skip frame extraction.

Examples

# Process a single video with defaults (1 fps, quality 95)
clip2context interview.mp4

# Process all videos in a folder, 1 frame every 2 seconds
clip2context ./recordings/ --fps 0.5

# Transcripts only, custom output directory
clip2context lecture.mp4 --only-transcripts --output-dir ./results

# Frames only, lower quality for smaller file sizes
clip2context demo.mov --only-frames --fps 2 --quality 75

# Process multiple videos at once
clip2context video1.mp4 video2.mp4 video3.mp4

# Extract only from 5 minutes to 10 minutes (MM:SS format)
clip2context long-video.mp4 --start-time 05:00 --end-time 10:00

# Extract first 30 seconds
clip2context video.mp4 --end-time 30

# Skip first minute, extract frames and transcript for the rest (HH:MM:SS format)
clip2context video.mp4 --start-time 1:00:00

Python API

You can also use clip2context programmatically:

from clip2context.main import run

# Full extraction (frames + transcript)
run("interview.mp4")

# Custom options
run(
    "lecture.mp4",
    output_base="results/",
    fps=0.5,
    quality=80,
    do_frames=True,
    do_transcript=True,
)

# Extract only a specific time range (in seconds)
run(
    "long-video.mp4",
    start_time=300,  # Start at 5 minutes
    end_time=600,    # End at 10 minutes
)

Or use the individual extractors directly:

from clip2context.extract_frames import extract_frames
from clip2context.extract_transcript import extract_transcript

# Extract frames → returns (output_dir, frame_count)
output_dir, count = extract_frames("video.mp4", "output/frames", fps=1.0, quality=95)

# Transcribe audio → returns output_dir
output_dir = extract_transcript("video.mp4", "output/transcript")

Output layout

Each video produces output under <output_dir>/<video_stem>/:

output/
└── interview/
    ├── frames/
    │   ├── frame_0001.webp
    │   ├── frame_0002.webp
    │   ├── …
    │   └── frames_manifest.json
    └── transcript/
        ├── transcript_raw.txt
        ├── transcript_timestamped.json
        └── transcript_timed.txt

frames_manifest.json

Maps each frame file to its timestamp:

[
  {
    "frame_filename": "frame_0001.webp",
    "timestamp_seconds": 0.0,
    "timestamp_formatted": "00:00:00"
  },
  ...
]

transcript_timestamped.json

Word-accurate segment boundaries from Whisper:

[
  {
    "start": 0.0,
    "end": 4.28,
    "text": "Welcome to today's session."
  },
  ...
]

transcript_timed.txt

Human-readable transcript with timestamps:

[00:00:00] Welcome to today's session.
[00:00:04] Let's get started.

Supported formats

.mp4 .mov .avi .mkv .webm .m4v .flv .wmv

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip2context-0.1.4.tar.gz (141.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clip2context-0.1.4-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file clip2context-0.1.4.tar.gz.

File metadata

  • Download URL: clip2context-0.1.4.tar.gz
  • Upload date:
  • Size: 141.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clip2context-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ac062418c115d62f365b3d5352a3591dd6439e266cb9b109b8b17f3071065ac5
MD5 ffd4bb7d3f0ae038139dd604f8a89270
BLAKE2b-256 fe9c55295b124836b2c27bd03f76538152123a947fca57c49bb10fa448a0d566

See more details on using hashes here.

File details

Details for the file clip2context-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: clip2context-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"25.10","id":"questing","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clip2context-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 051a5ddbc0e73a65bd26dac188def8e1a5538752c2bb39b66845ae58c7b79ae3
MD5 ea4d9cd87cbb579e1c069f5d9f8c3f05
BLAKE2b-256 76896e8916f47fa516e60c69a9960efa4e952f26fc78d5a3e4ac47fbb8344539

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page