tubefetch

YouTube video metadata, transcript, and media fetcher

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pointmatic

These details have not been verified by PyPI

Project description

tubefetch

Python License

A Python CLI and library that fetches and extracts structured metadata and transcripts from YouTube videos, producing LLM-ready plain text, content hashes for change detection, and unified video bundles with batch processing, caching, and retry logic.

TubeFetch is a Python tool that extracts structured, AI-ready content from YouTube videos. Given one or more video IDs, URLs, playlists, or channels, it produces normalized metadata, transcripts, and optional media in formats optimized for downstream AI/LLM pipelines (summarization, fact-checking, RAG, search indexing, etc.). It provides content hashes for change detection, optional token count estimates, and unified video bundles. The tool supports both CLI and library usage with batch processing, intelligent caching, configurable retries via gentlify, and rate limiting.

Features

Metadata — title, channel, duration, tags, upload date via yt-dlp (or YouTube Data API v3)
Transcripts — fetched via youtube-transcript-api with language preference and fallback
Media — optional video/audio download via yt-dlp
Export formats — JSON, plain text, WebVTT (.vtt), SubRip (.srt)
Batch processing — concurrent workers with per-video error isolation
Caching — skip already-fetched data; selective --force overrides
Retry — powered by gentlify with exponential backoff and jitter on transient errors
Rate limiting — token bucket algorithm, shared across workers
CLI + Library — use from the command line or import as a Python package

Installation

Requires Python 3.14+.

pip install tubefetch

Optional: YouTube Data API v3

Install for age-restricted or geo-restricted videos:

pip install tubefetch[youtube-api]
export TUBEFETCH_YT_API_KEY="your-api-key"

The YouTube Data API backend is used when:

Videos are age-restricted (require sign-in)
yt-dlp is blocked by YouTube's bot detection
You need higher rate limits

Get a free API key from Google Cloud Console. See Troubleshooting for setup instructions.

Note: The CLI accepts video IDs/URLs as positional arguments. Use tubefetch VIDEO_ID for the default behavior (metadata + transcript), or specialized commands like metadata, transcript, media for specific content.

Quick Start

CLI

# Fetch a single video
tubefetch dQw4w9WgXcQ

# Multiple videos
tubefetch VIDEO_ID_1 VIDEO_ID_2 VIDEO_ID_3

# From a file
tubefetch --file video_ids.txt

# With media download
tubefetch VIDEO_ID --download video

# Batch from a file
tubefetch --file video_ids.txt --workers 3

# Transcript only
tubefetch transcript dQw4w9WgXcQ --languages en,fr

# Metadata only
tubefetch metadata dQw4w9WgXcQ

# Media only (downloads video+audio by default)
tubefetch media dQw4w9WgXcQ

Specialized Commands

For exceptional cases when you only need specific data:

# Metadata only
tubefetch metadata VIDEO_ID

# Transcript only
tubefetch transcript VIDEO_ID

# Media only
tubefetch media VIDEO_ID

Library API

from tubefetch import fetch_video, fetch_batch, FetchOptions

# Single video
result = fetch_video("dQw4w9WgXcQ")
print(result.metadata.title)
print(result.transcript.segments[0].text)

# With options
opts = FetchOptions(out="./output", languages=["en", "fr"], download="audio")
result = fetch_video("dQw4w9WgXcQ", opts)

# Batch
results = fetch_batch(["dQw4w9WgXcQ", "abc12345678"], opts)
print(f"{results.succeeded}/{results.total} succeeded")

Output Structure

out/
├── <video_id>/
│   ├── metadata.json
│   ├── transcript.json
│   ├── transcript.txt
│   ├── transcript.vtt
│   ├── transcript.srt
│   └── media/
│       ├── video.mp4
│       └── audio.m4a
└── summary.json

Configuration

Options are resolved in this order (first wins):

CLI flags
Environment variables (prefix TUBEFETCH_)
YAML config file (tubefetch.yaml)
Defaults

CLI Flags

Flag	Description	Default
`--id`	Video ID or URL (repeatable)	—
`--file`	Text/CSV file with IDs	—
`--jsonl`	JSONL file with IDs	—
`--id-field`	Field name in CSV/JSONL	`id`
`--out`	Output directory	`./out`
`--languages`	Comma-separated language codes	`en`
`--allow-generated`	Allow auto-generated transcripts	`true`
`--allow-any-language`	Fall back to any language	`false`
`--download`	`none`, `video`, `audio`, `both`	`none`
`--max-height`	Max video height (e.g. 720)	—
`--format`	Video format	`best`
`--audio-format`	Audio format	`best`
`--force`	Force re-fetch everything	`false`
`--force-metadata`	Force re-fetch metadata only	`false`
`--force-transcript`	Force re-fetch transcript only	`false`
`--force-media`	Force re-download media only	`false`
`--retries`	Max retries per request	`3`
`--rate-limit`	Requests per second	`2.0`
`--workers`	Parallel workers for batch	`3`
`--fail-fast`	Stop on first failure	`false`
`--strict`	Exit code 2 on partial failure	`false`
`--verbose`	Verbose output	`false`

Environment Variables

All options can be set via environment variables with the TUBEFETCH_ prefix:

export TUBEFETCH_OUT=./output
export TUBEFETCH_LANGUAGES=en,fr
export TUBEFETCH_DOWNLOAD=video
export TUBEFETCH_YT_API_KEY=your-api-key

YAML Config File

Create tubefetch.yaml in the working directory:

out: ./output
languages:
  - en
  - fr
download: none
allow_generated: true
retries: 3
rate_limit: 2.0
workers: 3

Retry Configuration

tubefetch uses gentlify for intelligent retry management with exponential backoff and jitter.

How Retries Work

Transient errors (rate limits, network errors, service errors) are automatically retried
Permanent errors (video not found, transcripts disabled) fail immediately without retry
Configurable attempts: Set --retries N to control max retry attempts (default: 3)
Disable retries: Set --retries 0 for external retry management (e.g., with your own gentlify configuration)

Examples

from tubefetch import fetch_video, FetchOptions

# Default: 3 retry attempts
result = fetch_video("dQw4w9WgXcQ")

# Custom retry count
opts = FetchOptions(retries=5)
result = fetch_video("dQw4w9WgXcQ", opts)

# Disable internal retries (for external retry management)
opts = FetchOptions(retries=0)
result = fetch_video("dQw4w9WgXcQ", opts)

CLI:

# Custom retry count
tubefetch dQw4w9WgXcQ --retries 5

# Disable retries
tubefetch dQw4w9WgXcQ --retries 0

Exit Codes

Code	Meaning
0	Success (or partial failure without `--strict`)
1	Generic error (e.g. no IDs provided)
2	Partial failure with `--strict`
3	All videos failed

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run unit tests
python -m pytest tests/

# Run with coverage
python -m pytest tests/ --cov=tubefetch --cov-report=term-missing

# Run integration tests (requires network)
RUN_INTEGRATION=1 python -m pytest tests/integration/

License

MPL-2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pointmatic

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.2

Apr 14, 2026

1.4.1

Mar 9, 2026

0.9.6

Mar 7, 2026

0.9.4

Mar 4, 2026

This version

0.9.3

Mar 4, 2026

0.9.2

Mar 4, 2026

0.9.1

Mar 4, 2026

0.9.0

Mar 4, 2026

0.8.2

Mar 4, 2026

0.8.1

Mar 4, 2026

0.8.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubefetch-0.9.3.tar.gz (51.3 kB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tubefetch-0.9.3-py3-none-any.whl (38.7 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file tubefetch-0.9.3.tar.gz.

File metadata

Download URL: tubefetch-0.9.3.tar.gz
Upload date: Mar 4, 2026
Size: 51.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tubefetch-0.9.3.tar.gz
Algorithm	Hash digest
SHA256	`c8037aab67e3afab297f96ed06ec0aa61f8424f2234f36d8ed289ed5b0a80124`
MD5	`30c67c1a4712e1ed13df875269975caa`
BLAKE2b-256	`3dfcb07b7242f433d9427d1fc02f7b53fc128e97bdd9935aff7ed3e53f5a44d2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubefetch-0.9.3.tar.gz:

Publisher: release.yml on pointmatic/tubefetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tubefetch-0.9.3.tar.gz
- Subject digest: c8037aab67e3afab297f96ed06ec0aa61f8424f2234f36d8ed289ed5b0a80124
- Sigstore transparency entry: 1025614421
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: pointmatic/tubefetch@0c038c56a2f5b6446afb6c284cfe9338206b66c2
- Branch / Tag: refs/tags/v0.9.3
- Owner: https://github.com/pointmatic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0c038c56a2f5b6446afb6c284cfe9338206b66c2
- Trigger Event: push

File details

Details for the file tubefetch-0.9.3-py3-none-any.whl.

File metadata

Download URL: tubefetch-0.9.3-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 38.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tubefetch-0.9.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`207d0eacf6fa27a448e2148c17dc081a0254e697f3e09a2f351dbf3c3e6dc020`
MD5	`294e5cfcb9eb0e762b22465b358de7ba`
BLAKE2b-256	`e99ac3d782605fb54a4c5515247eae7240c4838fdbc2db99d1ef999c79faf732`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tubefetch-0.9.3-py3-none-any.whl:

Publisher: release.yml on pointmatic/tubefetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tubefetch-0.9.3-py3-none-any.whl
- Subject digest: 207d0eacf6fa27a448e2148c17dc081a0254e697f3e09a2f351dbf3c3e6dc020
- Sigstore transparency entry: 1025614489
- Sigstore integration time: Mar 4, 2026
Source repository:
- Permalink: pointmatic/tubefetch@0c038c56a2f5b6446afb6c284cfe9338206b66c2
- Branch / Tag: refs/tags/v0.9.3
- Owner: https://github.com/pointmatic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0c038c56a2f5b6446afb6c284cfe9338206b66c2
- Trigger Event: push

tubefetch 0.9.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

tubefetch

Features

Installation

Optional: YouTube Data API v3

Quick Start

CLI

Specialized Commands

Library API

Output Structure

Configuration

CLI Flags

Environment Variables

YAML Config File

Retry Configuration

How Retries Work

Examples

Exit Codes

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance