Composable JSONL-first CLI tools for YouTube media pipelines.
Project description
media-pipeline-cli
Composable JSONL-streaming CLI tools for YouTube media extraction and local transcription.
Quick Start: Export YouTube Watch Later URLs
1. Install dependencies
From the project root:
uv sync
2. Sign into YouTube in Chrome
The yt-watchlist command reads the Watch Later playlist through yt-dlp using your browser cookies.
Before running it:
- Open Chrome.
- Sign into the Google account that owns the target YouTube Watch Later playlist.
- Confirm that
https://www.youtube.com/playlist?list=WLshows your Watch Later items in that browser.
3. Run the watchlist export
uv run yt-watchlist list
By default, the CLI reads cookies from chrome.
If you use another browser:
uv run yt-watchlist list --browser brave
uv run yt-watchlist list --browser edge
uv run yt-watchlist list --browser firefox
If you need a specific browser profile:
uv run yt-watchlist list --browser chrome --browser-profile "Default"
4. Save the output to a file
uv run yt-watchlist list --output watchlist.jsonl
Each output line is a JSON object like:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube"}
5. Test with a small sample
uv run yt-watchlist list --limit 5
Troubleshooting
- No output: the YouTube Data API does not expose Watch Later items. This CLI uses
yt-dlpplus browser cookies instead. - Wrong Watch Later list: sign into the browser profile that owns the target YouTube Watch Later playlist.
- Cookie extraction issues: try
--browser-profile "Default"for Chrome-based browsers. - Browser mismatch: if Chrome is not where you are logged in, pass
--browser firefox,--browser brave, or another supported browser.
YouTube Data API Note
YouTube Data API v3 still works for normal playlists that are accessible to the authenticated caller, including:
- playlists you own
- public playlists
- unlisted playlists you can access
- channel-associated playlists like uploads and likes
It does not return items for the special Watch Later (WL) or Watch History (HL) playlists.
Download Media with yt-fetch
yt-fetch reads JSONL records from stdin or --input, downloads media with yt-dlp, and emits the original records enriched with audio_path or video_path.
Download audio from the watchlist
uv run yt-watchlist list --limit 5 \
| uv run yt-fetch audio --output-dir ./data \
> audio.jsonl
You can also read from a file:
uv run yt-fetch audio --input watchlist.jsonl --output-dir ./data > audio.jsonl
Example output record:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube","audio_path":"data/abc123.mp3"}
Download video instead of audio
uv run yt-fetch video --input watchlist.jsonl --output-dir ./data > video.jsonl
Example output record:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube","video_path":"data/abc123.mp4"}
Common yt-fetch options
--output-dir ./data: target directory for downloaded files--format mp3|wav|mp4: output format, defaults tomp3for audio andmp4for video--parallel 2: concurrent download workers--cache-dir ./.cache/yt-fetch: optional yt-dlp cache/archive directory--force: redownload even if the target file already exists
Examples:
uv run yt-fetch audio --input watchlist.jsonl --output-dir ./data --format wav
uv run yt-fetch audio --input watchlist.jsonl --output-dir ./data --parallel 4
uv run yt-fetch video --input watchlist.jsonl --output-dir ./data --format mp4
yt-fetch behavior
- Output filenames are deterministic:
{video_id}.{ext} - Existing files are reused unless
--forceis passed - Unknown input fields are preserved in the output JSONL
- Per-record failures are emitted back as JSON with
errorandstage
Transcribe Audio with audio-transcribe
audio-transcribe reads JSONL records that contain audio_path, runs local Whisper transcription, and emits the original records enriched with transcript and optionally segments.
Transcribe downloaded audio
uv run audio-transcribe --input audio.jsonl --model base > transcripts.jsonl
Or as a pipeline:
uv run yt-watchlist list --limit 5 \
| uv run yt-fetch audio --output-dir ./data \
| uv run audio-transcribe --model base \
> transcripts.jsonl
Example output record:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube","audio_path":"data/abc123.mp3","transcript":"full text...","segments":[{"start":0.0,"end":4.2,"text":"hello"},{"start":4.2,"end":8.1,"text":"world"}]}
Common audio-transcribe options
--model tiny|base|small|medium|large: Whisper model size, defaultbase--device auto|cpu|cuda: device selection, defaultauto--language auto|en|...: optional language hint--output-format json|text: include segments withjson, transcript-only withtext--cache-dir ./.cache/whisper: optional cache directory for transcript reuse--force: recompute transcript even if the record already has one
Examples:
uv run audio-transcribe --input audio.jsonl --model tiny
uv run audio-transcribe --input audio.jsonl --model base --language en
uv run audio-transcribe --input audio.jsonl --model small --device cpu
uv run audio-transcribe --input audio.jsonl --output-format text
audio-transcribe behavior
- The Whisper model is loaded once per run
- Timestamps are preserved when
--output-format jsonis used - Cached transcripts can be reused when
--cache-diris set - Unknown input fields are preserved in the output JSONL
- Per-record failures are emitted back as JSON with
errorandstage
End-to-End Pipeline
Fetch Watch Later URLs, download audio, and transcribe locally:
uv run yt-watchlist list \
| uv run yt-fetch audio --output-dir ./data \
| uv run audio-transcribe --model base \
> transcripts.jsonl
Run the stages separately:
uv run yt-watchlist list --output watchlist.jsonl
uv run yt-fetch audio --input watchlist.jsonl --output-dir ./data > audio.jsonl
uv run audio-transcribe --input audio.jsonl --model base > transcripts.jsonl
JSONL Contract
Each stage preserves upstream fields and appends new ones.
Watchlist records:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube"}
After audio download:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube","audio_path":"data/abc123.mp3"}
After transcription:
{"video_id":"abc123","title":"Video title","url":"https://www.youtube.com/watch?v=abc123","source":"youtube","audio_path":"data/abc123.mp3","transcript":"full text...","segments":[{"start":0.0,"end":4.2,"text":"hello"}]}
Failure records keep the original fields and add:
{"video_id":"abc123","url":"https://www.youtube.com/watch?v=abc123","error":"message","stage":"yt-fetch"}
Install as a CLI Tool
This project defines console entry points for:
yt-watchlistyt-fetchaudio-transcribe
You can run them with uv run ... from the repo, or install them as top-level CLI tools.
Install from the local repo
From the project root:
uv tool install .
That installs the package and exposes the CLI commands globally in your user tool environment.
If you update the repo later:
uv tool upgrade --from . media-pipeline-cli
Install directly from GitHub
Once the repository is public, you can install it without cloning first:
uv tool install git+https://github.com/neurongraph/media-pipeline-cli-tools.git
Alternative: pipx
If you prefer pipx:
pipx install .
Build and Package
Build the distributable artifacts from the project root:
uv build
This produces standard Python package artifacts in dist/, typically:
- a source distribution (
.tar.gz) - a wheel (
.whl)
These artifacts can be installed locally for validation:
uv tool install dist/*.whl
Publish to PyPI
When you are ready to publish:
uv publish
Typical release flow:
- Update the version in
pyproject.toml - Run
uv sync - Run validation checks
- Build with
uv build - Publish with
uv publish
After publishing, users can install the CLI directly from PyPI:
uv tool install media-pipeline-cli
Packaging Notes
- Project metadata and CLI entry points are defined in pyproject.toml
- The package is built with
hatchling - Wheel packaging is configured explicitly because the repo contains multiple top-level Python packages
uv syncprepares the local environment, whileuv tool installinstalls the package as a reusable CLI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file media_pipeline_cli-0.1.0.tar.gz.
File metadata
- Download URL: media_pipeline_cli-0.1.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0c9369960213c6f714d6def167ae5d12fb37a89559ebf69e5c5dcdc04485da7
|
|
| MD5 |
08c118e31d3f2f97c3e2c4da339b19cf
|
|
| BLAKE2b-256 |
44048a33fbd51d9a27d054022e931da39b85a844b33924fd239545fabf66e058
|
File details
Details for the file media_pipeline_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: media_pipeline_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3c8a19c4a432e40bf5fed0a7ca7d62eeaa839401b2da3e293bb82da182c2ba3
|
|
| MD5 |
52e915b33342abc82d91f4287277b930
|
|
| BLAKE2b-256 |
51042d420b11dfcc97840029404ec9a04681bb9ff0ba05e1a170c4e80f7f0d85
|