Skip to main content

Download YouTube audio as MP3 and transcribe with Parakeet-MLX (Apple Silicon/MLX)

Project description

podkeet

Download a YouTube video's audio as MP3 with yt-dlp and transcribe it using Parakeet-MLX (MLX on Apple Silicon).

Requirements

  • macOS on Apple Silicon (M1/M2/M3/M4)
  • Python >= 3.13
  • ffmpeg (for yt-dlp post-processing)
    • Install on macOS: brew install ffmpeg

Parakeet-MLX is installed as a dependency and will use MLX (Metal) on Apple Silicon when device=auto.

Quick start

Run directly with uvx (no virtualenv needed):

uvx podkeet transcribe "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --out-dir ./outputs

Note: The first run will install the podkeet CLI automatically. If you prefer a persistent install:

uvx pip install -U podkeet

Or, if you prefer working in a virtual environment:

uv venv --python 3.13
uv sync --extra dev
podkeet transcribe "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --out-dir ./outputs

This will:

  • Check for ffmpeg and instruct you to install it if missing.
  • Download the best audio stream and convert it to MP3.
  • Transcribe the MP3 with Parakeet-MLX, saving a transcript next to the audio (and in --out-dir).

Installation (PyPI)

Once released on PyPI, you can install directly:

uvx pip install -U podkeet

CLI reference

  • podkeet download URL --out-dir PATH [--no-timing]
  • podkeet transcribe URL_OR_FILE --out-dir PATH [--keep-audio] [--language auto|en|…] [--model NAME] [--format txt|srt|vtt|json] [--device auto|mps|cpu] [--no-timing] [--version]

Notes:

  • If ffmpeg is missing, a clear message explains how to install it.
  • The first transcription may download Parakeet-MLX models; subsequent runs use the local cache.
  • On Apple Silicon, device=auto prefers MLX (mps) and falls back to CPU if needed.
  • Timing: The CLI shows elapsed time for download and transcription; hide with --no-timing.
  • JSON: When --format json is used, the CLI prints a compact JSON summary to stdout (suitable for automation).

Robustness

  • Filenames with special characters: We detect the actual file written by yt-dlp instead of guessing by title, avoiding path mismatches.
  • Large files / memory: If a full-file transcription hits a Metal/MLX memory error, the tool automatically falls back to chunked transcription (~10-minute segments) and merges results with correct timestamps.
  • Network hiccups: The downloader uses retries, socket timeouts, and exponential backoff to handle transient network failures.

Examples

# Download only
podkeet download "https://www.youtube.com/watch?v=8P7v1lgl-1s" --out-dir ./podcasts

# Transcribe from URL with a specific start (yt-dlp handles t=)
podkeet transcribe "https://www.youtube.com/watch?v=8P7v1lgl-1s&t=121s" --out-dir ./podcasts

# Transcribe a local file to SRT
podkeet transcribe ./podcasts/example.mp3 --out-dir ./podcasts --format srt

# JSON summary output (includes timings):
podkeet transcribe "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --format json | jq

Development

Install dev extras and set up the environment:

uv venv --python 3.13
uv sync --extra dev

Format with Ruff:

uvx ruff format

Lint with Ruff:

uvx ruff check
uvx ruff check --fix

Type-check with Ty (pre-release):

uvx ty check

Run tests:

uv run pytest -q

Build package (sdist + wheel):

uvx --from build pyproject-build
ls dist/

CI/CD

  • CI (lint, type, tests, build) runs on pushes and PRs.
  • Releases are automated:
    • Conventional Commits drive version bumps and CHANGELOG.md via Python Semantic Release.
    • A tag vX.Y.Z is created on main.
    • The Release workflow builds and publishes to PyPI using OIDC (Trusted Publishing).

Commit message hints (Conventional Commits):

  • feat: … → minor version bump
  • fix: … → patch version bump
  • feat!: … or footer BREAKING CHANGE: → major version bump

Troubleshooting

  • ffmpeg not found: brew install ffmpeg (then re-run).
  • MLX out-of-memory: The tool will switch to chunked transcription automatically; if still failing, try a smaller model.
  • Network or YouTube rate limiting: The downloader retries with backoff; re-run later if persistent.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podkeet-1.1.1.tar.gz (85.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

podkeet-1.1.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file podkeet-1.1.1.tar.gz.

File metadata

  • Download URL: podkeet-1.1.1.tar.gz
  • Upload date:
  • Size: 85.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for podkeet-1.1.1.tar.gz
Algorithm Hash digest
SHA256 5c6d9c34817c493caa1f247c435ac9948a0809e821dbb8d76aa6b20358cc6513
MD5 7f85e19fd36ac549f0eef9cf95b8e642
BLAKE2b-256 c68042178689718cc88e221692a41693336ee5a23745bc988e7ea4656145e0ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for podkeet-1.1.1.tar.gz:

Publisher: release.yml on chrisdoc/podkeet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file podkeet-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: podkeet-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for podkeet-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 797013e7d2aa9f808f97171227367b153725bca49a047c8d361910f957c6867e
MD5 1b200c8a838376d65eba71ce3d7c0ed1
BLAKE2b-256 ca21e92a36b9807398cf3c36aed1effd2600a4ec6b1da19d8173bd933fb4f1f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for podkeet-1.1.1-py3-none-any.whl:

Publisher: release.yml on chrisdoc/podkeet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page