Command-line tool for analyzing audio/video with the Gemini API
Project description
🍿 vid2md
Simple Python command-line tool that uses Google's Gemini API to extract data from audio and video files and output markdown.
Cobbled together at AI Engineer World's Fair 2025, using code from this workshop by @philschmid. 🙏
It can interpret:
- Audio files on your computer (mp3, m4a, etc)
- Video files on your computer (mp4, mov, etc)
- YouTube videos (by URL)
It generates:
- A title
- A TLDR
- A one-paragraph summary
- A table of contents
- A transcript
- A cleaned-up transcript
Usage
Get a Google Gemini API key at aistudio.google.com/apikey
Set the GEMINI_API_KEY environment variable:
export GEMINI_API_KEY=YOUR_API_KEY
Run without installing this repo (requires uv):
uvx vid2md <video_path_or_youtube_url>
Tip: you can also run gmi or gemini-media-interpreter for the same tool.
Or run from a clone (install deps first):
pip install -r requirements.txt
python video.py <video_path_or_youtube_url>
Run the tool:
vid2md <video_path_or_youtube_url>
You can provide either a path to a local audio or video file or a YouTube URL as the first argument.
By default, the tool uses a built-in prompt (or ./prompt.md if present). You can specify a custom prompt file with the --prompt flag:
vid2md <video_path_or_youtube_url> --prompt custom_prompt.md
You can also specify which Gemini model to use with the --model flag (default: gemini-2.5-flash-preview-05-20):
vid2md <video_path_or_youtube_url> --model gemini-2.5-pro-preview-06-05
See the list of available models here: Gemini API Models
Examples
Analyze a local audio file (mp3, m4a, etc):
vid2md sample.m4a
Analyze a local video file (mp4, mov, etc):
vid2md sample.mov
Analyze a YouTube video directly by URL:
vid2md "https://www.youtube.com/watch?v=dwgmfSOZNoQ"
Example output
## Title
* Extracting video metadata: an initial problem.
* A brief introduction to video metadata challenges.
* Understanding metadata loss in video uploads.
## TLDR
* Learn about metadata loss when uploading videos.
* Discover issues preserving video effects online.
* Identify challenges in video metadata retention.
## One-paragraph summary
This video provides a brief, introductory look into the concept of extracting video metadata. The speaker shares his experience recording a video for his team using OBS, an open-source software, to add various visual enhancements like a green screen. He then highlights a common challenge: when these videos are uploaded to platforms like Loom, the added visual "bells and whistles" (metadata) often fail to transfer. This short clip effectively sets the stage by introducing the problem of metadata degradation in video sharing, signaling the speaker's intention to explore solutions for extraction and preservation.
This video serves as an initial segment, introducing the topic of video metadata extraction by illustrating a practical problem. The speaker explains how he utilized OBS to create a video with specific visual effects, but found that upon uploading it to Loom, these enhancements were lost. Viewers can gain an understanding of the common issue where valuable visual metadata isn't retained across different platforms, highlighting the need for methods to extract and manage such information effectively.
In this short introductory video, the presenter discusses the upcoming topic of extracting video metadata. He recounts how he created a video using OBS to incorporate advanced visual elements like a green screen. The main point conveyed is that when this video was subsequently uploaded to Loom, the specific visual metadata he had added was not preserved. This segment therefore clarifies a key challenge in video content management: ensuring that embedded information and visual effects remain intact when shared across different video platforms.
## TOC
0:00 Introduction to video metadata extraction
0:07 Recording videos with OBS with special effects
0:29 The problem of metadata loss on video platforms
0:44 Speaker restarts video
## Transcript
Hey all you cool cats and kittens, I want to show you how to extract metadata from videos. So today, earlier today, I recorded this video for the team about our client libraries bake-off. And I wanted to add some bells and whistles to it using a green screen and stuff, so I used a product called OBS, which is an open source, uh, piece of software that you install on your Mac for recording. Um, so that's cool. But the thing that's not cool is when I upload that video to Fern, which is the sort of, uh, website that we use to share videos. You don't get any of the cool, um, bells and whistles that come with Fern. Or, did I say Fern? I meant Loom. I'm going to start over.
## Clean transcript
Hey all you cool cats and kittens, I want to show you how to extract metadata from videos. Earlier today, I recorded this video for the team about our client libraries bake-off. I added bells and whistles using a green screen. I used OBS, an open-source software you install on your Mac for recording. That's cool. The thing that's not cool is when I upload that video to Loom, the website we use to share videos, you don't get any of the cool bells and whistles that come with Loom. I meant Loom. I'm going to start over.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vid2md-0.1.0.tar.gz.
File metadata
- Download URL: vid2md-0.1.0.tar.gz
- Upload date:
- Size: 10.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
365dc1ccffc3982a2213aea78307fff9fea734a05f00898e7d6d8fc4b87624da
|
|
| MD5 |
511c2e37b4f8f69e09517a337d0f4cda
|
|
| BLAKE2b-256 |
ac7705e1e5e514d33485a9a1fc986048c956e8cc028074247dc4ea8f50592b89
|
File details
Details for the file vid2md-0.1.0-py3-none-any.whl.
File metadata
- Download URL: vid2md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8382b10700f64c575fd929a2e1519f40acff24ea8c2d78dbfa0d87a6626bef1a
|
|
| MD5 |
b0488b86d30c38ea0e40690a4e1df2bc
|
|
| BLAKE2b-256 |
83b337690756dd37bf08a29d829166ee3614be2ebd8f6a4cfda825c30662a30e
|