crawl4ai for video & audio — turn any YouTube video, podcast episode, or local recording into clean, timestamped, LLM-ready markdown
Project description
hearsay
crawl4ai for video & audio. One command turns any YouTube video, podcast episode, or local recording into clean, timestamped, chunked, LLM-ready markdown — for RAG pipelines and AI agents.
Why
Getting a transcript into your RAG pipeline usually means gluing together
yt-dlp, Whisper, and a pile of timestamp-wrangling scripts — and you still end
up with one line per caption fragment or an undifferentiated wall of text.
hearsay does the whole thing in one command and gives you back markdown a human
and a model can read: readable paragraphs, real timestamps, chapter headings,
and an optional JSON sidecar with a stable schema.
Install
uv tool install hearsay # recommended
# or
pipx install hearsay
# transcription + MCP server support:
uv tool install "hearsay[mcp]"
Pre-release: hearsay isn't on PyPI yet. Until the first release, install from a checkout:
git clone https://github.com/mudassar531/hearsay cd hearsay uv tool install . # puts `hearsay` on your PATH # or, for development: uv sync && uv run hearsay --help
System requirement: ffmpeg on your PATH.
30-second quickstart
# YouTube → markdown via captions (fast — no download)
hearsay "https://www.youtube.com/watch?v=VIDEO_ID"
# Local audio/video → markdown via local Whisper (runs on CPU)
hearsay talk.mp3
# Force Whisper on a YouTube URL, pick a model, also emit JSON
hearsay "https://youtu.be/VIDEO_ID" --transcribe --model small --json
# Music/song? Add --no-vad so the lyrics aren't filtered out as "non-speech"
hearsay "https://youtu.be/SONG_ID" --no-vad
# A podcast feed or YouTube playlist: list, then ingest a selection
hearsay "https://example.com/feed.xml"
hearsay "https://example.com/feed.xml" --all --limit 3 --output-dir ./out
No captions on a video? hearsay falls back to local Whisper automatically.
What you get
---
title: "You Would Be a Terrible Leader"
source: "https://www.youtube.com/watch?v=rStL7niR7gs"
channel: "CGP Grey"
duration: "00:18:13"
ingested: "2026-06-13T10:00:00Z"
method: "captions"
language: "en"
---
# You Would Be a Terrible Leader
## [00:00:00 – 00:05:21]
**[00:00:00]** Do you want to rule? Do you see the problems in your country and
know how to fix them? If only you had the power to do so. Well. You've come to
the right place. But, before we begin this lesson in political power, ask
yourself, why don't rulers see as clearly as you...
Pass --json for a sidecar matching the Transcript schema:
metadata plus chunks[], each with start_s, end_s, section, and text —
ready to embed.
How it compares
| hearsay | DIY yt-dlp + Whisper |
markitdown / docling | |
|---|---|---|---|
| Input | video & audio | video & audio (you wire it) | documents (pdf/docx/pptx) |
| One command | ✅ | ❌ multi-step plumbing | ✅ (for docs) |
| Captions-first (no download) | ✅ | ✗ usually re-transcribes | n/a |
| Timestamps + paragraph grouping | ✅ readable | ✗ raw segments | n/a |
| Chapters → sections | ✅ | ✗ manual | n/a |
| Podcasts · playlists · batch | ✅ | ✗ manual | ✗ |
| JSON sidecar for RAG | ✅ stable schema | ✗ manual | varies |
| MCP server for agents | ✅ | ✗ | varies |
hearsay does media; document tools like markitdown and docling do documents. Use both.
Give your agent ears
hearsay ships an MCP server so AI agents can
ingest media themselves. It exposes two tools — ingest_url(url, transcribe?, lang?)
and ingest_file(path) — that each return clean, timestamped markdown.
uv tool install "hearsay[mcp]"
hearsay mcp # stdio MCP server (Ctrl-C to stop)
Claude Code:
claude mcp add hearsay -- hearsay mcp
or add to .mcp.json (project) / ~/.claude.json (user):
{
"mcpServers": {
"hearsay": {
"type": "stdio",
"command": "hearsay",
"args": ["mcp"]
}
}
}
Claude Desktop — add to claude_desktop_config.json (Settings → Developer →
Edit Config; macOS: ~/Library/Application Support/Claude/, Windows:
%APPDATA%\Claude\):
{
"mcpServers": {
"hearsay": {
"type": "stdio",
"command": "hearsay",
"args": ["mcp"],
"env": {
"HEARSAY_MODEL": "small"
}
}
}
}
If hearsay is not on the host's PATH, use the absolute path (which hearsay),
or "command": "python", "args": ["-m", "hearsay", "mcp"].
Server configuration (env vars, since MCP tool signatures are fixed):
| Variable | Default | Effect |
|---|---|---|
HEARSAY_MODEL |
small |
Whisper model size (tiny…large-v3) |
HEARSAY_LANG |
(unset) | Default language: English captions, else Whisper auto-detect |
HEARSAY_VAD |
1 |
Voice-activity filter; set 0 for music/songs |
Speech vs. music: hearsay is tuned for spoken audio (podcasts, talks, interviews, meetings), where transcription is accurate. For music, pass
--no-vadso the vocals aren't discarded — but expect a rough, approximate lyric transcript, since Whisper is a speech model, not a lyrics transcriber.
CLI reference
hearsay <SOURCE> [options] SOURCE = YouTube video/playlist URL, podcast RSS, or local file
-o, --output PATH Output file for a single source (default ./<id>.md)
--output-dir PATH Output directory for batch (playlist/feed) ingestion (default ./hearsay-out)
--lang CODE Language: captions default to English; transcription auto-detects
--transcribe Force local Whisper even when captions exist
--model SIZE Whisper model: tiny | base | small | medium | large-v3 (default small)
--no-vad Disable voice-activity filtering (use for music/songs)
--json Also write a .json sidecar (Transcript schema)
--latest Batch: ingest only the most recent item
--episode N Batch: ingest only item N (1-indexed)
--all [--limit N] Batch: ingest all items (optionally capped)
--version Show version
hearsay mcp Run the MCP stdio server
Requirements
- Python 3.11+
- ffmpeg on your PATH. hearsay decodes most audio/video directly (faster-whisper bundles its own decoder), but ffmpeg is the safe baseline and is used for some yt-dlp format merges.
| OS | Install ffmpeg |
|---|---|
| macOS (Homebrew) | brew install ffmpeg |
| Debian / Ubuntu | sudo apt install ffmpeg |
| Fedora | sudo dnf install ffmpeg |
| Arch | sudo pacman -S ffmpeg |
| Windows (winget) | winget install Gyan.FFmpeg |
| Windows (Chocolatey) | choco install ffmpeg |
The first transcription downloads the chosen Whisper model once (tens of MB to ~1.5 GB), then caches it for offline use.
Contributing
See CONTRIBUTING.md and the good first issues. hearsay does one thing well — media → great markdown — and aims to keep doing exactly that.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hearsay-0.1.0.tar.gz.
File metadata
- Download URL: hearsay-0.1.0.tar.gz
- Upload date:
- Size: 33.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
827ad0fefbc15e7aa3b5adedd770784255a38b2e52efca2ef7028c3b52d6044b
|
|
| MD5 |
8273389d1221311722216acc934639c3
|
|
| BLAKE2b-256 |
3e0a8a6cf20891bfcb91454ef765a98afaca3074f78e383f319393ccd417a38f
|
File details
Details for the file hearsay-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hearsay-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5b0a89b084733617182b4d478289778f3194bc5689aa137d2683ca74e2c1fb8
|
|
| MD5 |
0c2caaa260c5cfb187f00131f7bf2107
|
|
| BLAKE2b-256 |
5532c499b63241b2d3a63344dc60c5ccbbd64316c0144dd9e097bdb30dc8bc07
|