Skip to main content

crawl4ai for video & audio — turn any YouTube video, podcast episode, or local recording into clean, timestamped, LLM-ready markdown

Project description

hearsay

crawl4ai for video & audio. One command turns any YouTube video, podcast episode, or local recording into clean, timestamped, chunked, LLM-ready markdown — for RAG pipelines and AI agents.

CI Python 3.11+ License: MIT

hearsay in action

Why

Getting a transcript into your RAG pipeline usually means gluing together yt-dlp, Whisper, and a pile of timestamp-wrangling scripts — and you still end up with one line per caption fragment or an undifferentiated wall of text. hearsay does the whole thing in one command and gives you back markdown a human and a model can read: readable paragraphs, real timestamps, chapter headings, and an optional JSON sidecar with a stable schema.

Install

uv tool install hearsay          # recommended
# or
pipx install hearsay
# transcription + MCP server support:
uv tool install "hearsay[mcp]"

Pre-release: hearsay isn't on PyPI yet. Until the first release, install from a checkout:

git clone https://github.com/mudassar531/hearsay
cd hearsay
uv tool install .               # puts `hearsay` on your PATH
# or, for development:  uv sync && uv run hearsay --help

System requirement: ffmpeg on your PATH.

30-second quickstart

# YouTube → markdown via captions (fast — no download)
hearsay "https://www.youtube.com/watch?v=VIDEO_ID"

# Local audio/video → markdown via local Whisper (runs on CPU)
hearsay talk.mp3

# Force Whisper on a YouTube URL, pick a model, also emit JSON
hearsay "https://youtu.be/VIDEO_ID" --transcribe --model small --json

# Music/song? Add --no-vad so the lyrics aren't filtered out as "non-speech"
hearsay "https://youtu.be/SONG_ID" --no-vad

# A podcast feed or YouTube playlist: list, then ingest a selection
hearsay "https://example.com/feed.xml"
hearsay "https://example.com/feed.xml" --all --limit 3 --output-dir ./out

No captions on a video? hearsay falls back to local Whisper automatically.

What you get

---
title: "You Would Be a Terrible Leader"
source: "https://www.youtube.com/watch?v=rStL7niR7gs"
channel: "CGP Grey"
duration: "00:18:13"
ingested: "2026-06-13T10:00:00Z"
method: "captions"
language: "en"
---

# You Would Be a Terrible Leader

## [00:00:00 – 00:05:21]

**[00:00:00]** Do you want to rule? Do you see the problems in your country and
know how to fix them? If only you had the power to do so. Well. You've come to
the right place. But, before we begin this lesson in political power, ask
yourself, why don't rulers see as clearly as you...

Pass --json for a sidecar matching the Transcript schema: metadata plus chunks[], each with start_s, end_s, section, and text — ready to embed.

How it compares

hearsay DIY yt-dlp + Whisper markitdown / docling
Input video & audio video & audio (you wire it) documents (pdf/docx/pptx)
One command ❌ multi-step plumbing ✅ (for docs)
Captions-first (no download) ✗ usually re-transcribes n/a
Timestamps + paragraph grouping ✅ readable ✗ raw segments n/a
Chapters → sections ✗ manual n/a
Podcasts · playlists · batch ✗ manual
JSON sidecar for RAG ✅ stable schema ✗ manual varies
MCP server for agents varies

hearsay does media; document tools like markitdown and docling do documents. Use both.

Give your agent ears

hearsay ships an MCP server so AI agents can ingest media themselves. It exposes two tools — ingest_url(url, transcribe?, lang?) and ingest_file(path) — that each return clean, timestamped markdown.

uv tool install "hearsay[mcp]"
hearsay mcp                      # stdio MCP server (Ctrl-C to stop)

Claude Code:

claude mcp add hearsay -- hearsay mcp

or add to .mcp.json (project) / ~/.claude.json (user):

{
  "mcpServers": {
    "hearsay": {
      "type": "stdio",
      "command": "hearsay",
      "args": ["mcp"]
    }
  }
}

Claude Desktop — add to claude_desktop_config.json (Settings → Developer → Edit Config; macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\):

{
  "mcpServers": {
    "hearsay": {
      "type": "stdio",
      "command": "hearsay",
      "args": ["mcp"],
      "env": {
        "HEARSAY_MODEL": "small"
      }
    }
  }
}

If hearsay is not on the host's PATH, use the absolute path (which hearsay), or "command": "python", "args": ["-m", "hearsay", "mcp"].

Server configuration (env vars, since MCP tool signatures are fixed):

Variable Default Effect
HEARSAY_MODEL small Whisper model size (tinylarge-v3)
HEARSAY_LANG (unset) Default language: English captions, else Whisper auto-detect
HEARSAY_VAD 1 Voice-activity filter; set 0 for music/songs

Speech vs. music: hearsay is tuned for spoken audio (podcasts, talks, interviews, meetings), where transcription is accurate. For music, pass --no-vad so the vocals aren't discarded — but expect a rough, approximate lyric transcript, since Whisper is a speech model, not a lyrics transcriber.

CLI reference

hearsay <SOURCE> [options]      SOURCE = YouTube video/playlist URL, podcast RSS, or local file

  -o, --output PATH    Output file for a single source (default ./<id>.md)
  --output-dir PATH    Output directory for batch (playlist/feed) ingestion (default ./hearsay-out)
  --lang CODE          Language: captions default to English; transcription auto-detects
  --transcribe         Force local Whisper even when captions exist
  --model SIZE         Whisper model: tiny | base | small | medium | large-v3 (default small)
  --no-vad             Disable voice-activity filtering (use for music/songs)
  --json               Also write a .json sidecar (Transcript schema)
  --latest             Batch: ingest only the most recent item
  --episode N          Batch: ingest only item N (1-indexed)
  --all [--limit N]    Batch: ingest all items (optionally capped)
  --version            Show version

hearsay mcp            Run the MCP stdio server

Requirements

  • Python 3.11+
  • ffmpeg on your PATH. hearsay decodes most audio/video directly (faster-whisper bundles its own decoder), but ffmpeg is the safe baseline and is used for some yt-dlp format merges.
OS Install ffmpeg
macOS (Homebrew) brew install ffmpeg
Debian / Ubuntu sudo apt install ffmpeg
Fedora sudo dnf install ffmpeg
Arch sudo pacman -S ffmpeg
Windows (winget) winget install Gyan.FFmpeg
Windows (Chocolatey) choco install ffmpeg

The first transcription downloads the chosen Whisper model once (tens of MB to ~1.5 GB), then caches it for offline use.

Contributing

See CONTRIBUTING.md and the good first issues. hearsay does one thing well — media → great markdown — and aims to keep doing exactly that.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hearsay-0.1.0.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hearsay-0.1.0-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file hearsay-0.1.0.tar.gz.

File metadata

  • Download URL: hearsay-0.1.0.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hearsay-0.1.0.tar.gz
Algorithm Hash digest
SHA256 827ad0fefbc15e7aa3b5adedd770784255a38b2e52efca2ef7028c3b52d6044b
MD5 8273389d1221311722216acc934639c3
BLAKE2b-256 3e0a8a6cf20891bfcb91454ef765a98afaca3074f78e383f319393ccd417a38f

See more details on using hashes here.

File details

Details for the file hearsay-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hearsay-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hearsay-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5b0a89b084733617182b4d478289778f3194bc5689aa137d2683ca74e2c1fb8
MD5 0c2caaa260c5cfb187f00131f7bf2107
BLAKE2b-256 5532c499b63241b2d3a63344dc60c5ccbbd64316c0144dd9e097bdb30dc8bc07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page