Skip to main content

Command-line tool for analyzing audio/video with the Gemini API

Project description

Gemini media interpreter

🍿 vid2md

Simple Python command-line tool that uses Google's Gemini API to extract data from audio and video files and output markdown.

Cobbled together at AI Engineer World's Fair 2025, using code from this workshop by @philschmid. 🙏


It can interpret:

  • Audio files on your computer (mp3, m4a, etc)
  • Video files on your computer (mp4, mov, etc)
  • YouTube videos (by URL)

It generates:

  • A title
  • A TLDR
  • A one-paragraph summary
  • A table of contents
  • A transcript
  • A cleaned-up transcript

Usage

Get a Google Gemini API key at aistudio.google.com/apikey

Set the GEMINI_API_KEY environment variable:

export GEMINI_API_KEY=YOUR_API_KEY

Run without installing this repo (requires uv):

uvx vid2md <video_path_or_youtube_url>

Tip: you can also run gmi or gemini-media-interpreter for the same tool.

Or run from a clone (install deps first):

pip install -r requirements.txt
python video.py <video_path_or_youtube_url>

Run the tool:

vid2md <video_path_or_youtube_url>

You can provide either a path to a local audio or video file or a YouTube URL as the first argument.

By default, the tool uses a built-in prompt (or ./prompt.md if present). You can specify a custom prompt file with the --prompt flag:

vid2md <video_path_or_youtube_url> --prompt custom_prompt.md

You can also specify which Gemini model to use with the --model flag (default: gemini-2.5-flash-preview-05-20):

vid2md <video_path_or_youtube_url> --model gemini-2.5-pro-preview-06-05

See the list of available models here: Gemini API Models

Examples

Analyze a local audio file (mp3, m4a, etc):

vid2md sample.m4a

Analyze a local video file (mp4, mov, etc):

vid2md sample.mov

Analyze a YouTube video directly by URL:

vid2md "https://www.youtube.com/watch?v=dwgmfSOZNoQ"

Example output

## Title

*   Extracting video metadata: an initial problem.
*   A brief introduction to video metadata challenges.
*   Understanding metadata loss in video uploads.

## TLDR

*   Learn about metadata loss when uploading videos.
*   Discover issues preserving video effects online.
*   Identify challenges in video metadata retention.

## One-paragraph summary

This video provides a brief, introductory look into the concept of extracting video metadata. The speaker shares his experience recording a video for his team using OBS, an open-source software, to add various visual enhancements like a green screen. He then highlights a common challenge: when these videos are uploaded to platforms like Loom, the added visual "bells and whistles" (metadata) often fail to transfer. This short clip effectively sets the stage by introducing the problem of metadata degradation in video sharing, signaling the speaker's intention to explore solutions for extraction and preservation.

This video serves as an initial segment, introducing the topic of video metadata extraction by illustrating a practical problem. The speaker explains how he utilized OBS to create a video with specific visual effects, but found that upon uploading it to Loom, these enhancements were lost. Viewers can gain an understanding of the common issue where valuable visual metadata isn't retained across different platforms, highlighting the need for methods to extract and manage such information effectively.

In this short introductory video, the presenter discusses the upcoming topic of extracting video metadata. He recounts how he created a video using OBS to incorporate advanced visual elements like a green screen. The main point conveyed is that when this video was subsequently uploaded to Loom, the specific visual metadata he had added was not preserved. This segment therefore clarifies a key challenge in video content management: ensuring that embedded information and visual effects remain intact when shared across different video platforms.

## TOC

0:00 Introduction to video metadata extraction
0:07 Recording videos with OBS with special effects
0:29 The problem of metadata loss on video platforms
0:44 Speaker restarts video

## Transcript

Hey all you cool cats and kittens, I want to show you how to extract metadata from videos. So today, earlier today, I recorded this video for the team about our client libraries bake-off. And I wanted to add some bells and whistles to it using a green screen and stuff, so I used a product called OBS, which is an open source, uh, piece of software that you install on your Mac for recording. Um, so that's cool. But the thing that's not cool is when I upload that video to Fern, which is the sort of, uh, website that we use to share videos. You don't get any of the cool, um, bells and whistles that come with Fern. Or, did I say Fern? I meant Loom. I'm going to start over.

## Clean transcript

Hey all you cool cats and kittens, I want to show you how to extract metadata from videos. Earlier today, I recorded this video for the team about our client libraries bake-off. I added bells and whistles using a green screen. I used OBS, an open-source software you install on your Mac for recording. That's cool. The thing that's not cool is when I upload that video to Loom, the website we use to share videos, you don't get any of the cool bells and whistles that come with Loom. I meant Loom. I'm going to start over.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vid2md-0.1.0.tar.gz (10.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vid2md-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file vid2md-0.1.0.tar.gz.

File metadata

  • Download URL: vid2md-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vid2md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 365dc1ccffc3982a2213aea78307fff9fea734a05f00898e7d6d8fc4b87624da
MD5 511c2e37b4f8f69e09517a337d0f4cda
BLAKE2b-256 ac7705e1e5e514d33485a9a1fc986048c956e8cc028074247dc4ea8f50592b89

See more details on using hashes here.

File details

Details for the file vid2md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vid2md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vid2md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8382b10700f64c575fd929a2e1519f40acff24ea8c2d78dbfa0d87a6626bef1a
MD5 b0488b86d30c38ea0e40690a4e1df2bc
BLAKE2b-256 83b337690756dd37bf08a29d829166ee3614be2ebd8f6a4cfda825c30662a30e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page