YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation.

These details have not been verified by PyPI

Project links

Project description

YTFetcher

⚡ Build structured YouTube datasets for NLP, ML, sentiment analysis & RAG in minutes.

A python tool for fetching thousands of videos fast from a Youtube channel along with structured transcripts and additional metadata. Export data easily as CSV, TXT, or JSON.

📚 Table of Contents

Installation
Quick CLI Usage
Basic Usage (Python API)
Features
Fetching Specific Channel Tabs (Videos / Shorts / Streams)
Using Different Fetchers
Retrieve Different Languages
Filtering
Converting Fetch Results to Rows
SQLite Cache
Failed Transcripts & Retry Behavior
Fetching Only Manually Created Transcripts
Exporting
Comments
Other Methods
Proxy Configuration
Advanced HTTP Configuration (Optional)
CLI (Advanced)
Docker Quick Start
Contributing
Running Tests
Related Projects
License
Contributors

Installation

Install from PyPI:

pip install ytfetcher

Quick CLI Usage

Fetch 50 video transcripts + metadata from a channel and save as JSON:

ytfetcher channel TheOffice -m 50 -f json

Basic Usage (Python API)

Here’s how you can get transcripts and metadata information like channel name, description, published date, etc. from a single channel with from_channel method:

from ytfetcher import YTFetcher

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=2
)

channel_data = fetcher.fetch_youtube_data()
for video in channel_data:
  print(video.metadata.title)
  print(video.metadata.description)
  print(video.transcripts)

This will return a list of ChannelData with metadata in DLSnippet objects:

[
ChannelData(
    video_id='video1',
    transcripts=[
        Transcript(
            text="Hey there",
            start=0.0,
            duration=1.54
        ),
        Transcript(
            text="Happy coding!",
            start=1.56,
            duration=4.46
        )
    ]
    metadata=DLSnippet(
        video_id='video1',
        title='VideoTitle',
        description='VideoDescription',
        url='https://youtu.be/video1',
        duration=120,
        view_count=1000,
        thumbnails=[{'url': 'thumbnail_url'}]
    )
),
# Other ChannelData objects...
]

You can also preview this data using PreviewRenderer class from ytfetcher.services.

from ytfetcher.services import PreviewRenderer

channel_data = fetcher.fetch_with_comments(max_comments=10)
#print(channel_data)
preview = PreviewRenderer()
preview.render(data=channel_data, limit=4)

This will preview the first 4 results of the data in a beautifully formatted terminal view, including metadata, transcript snippets, and comments.

Features

Fetch full transcripts from a YouTube channel.
Get video metadata: title, description, thumbnails, published date.
Support for fetching with channel handle, playlist id, custom video id's or with a search query.
Fetch comments in bulk.
Concurrent fetching for high performance.
Built in cache support.
Export fetched data as txt, csv or json.
CLI support.

Fetching Specific Channel Tabs (Videos / Shorts / Streams)

Use the tab parameter in from_channel() to select which section of a channel to fetch.

Available options:

'videos' (default)
'shorts'
'streams'

If not specified, the fetcher defaults to the Videos tab.

# Fetch regular videos (default)
YTFetcher.from_channel(channel_handle="handle")

# Fetch Shorts
YTFetcher.from_channel(channel_handle="handle", tab="shorts")

# Fetch live streams
YTFetcher.from_channel(channel_handle="handle", tab="streams")

Using Different Fetchers

ytfetcher supports various fetching options that includes:

Fetching from a playlist id with from_playlist_id method.
Fetching from video id's with from_video_ids method.
Fetching from a search query with from_search method.

Fetching from Playlist ID

Use from_playlist_id to retrieve metadata and transcripts for every video within a public or unlisted YouTube playlist.

from ytfetcher import YTFetcher

fetcher = YTFetcher.from_playlist_id(
    playlist_id="playlistid1254"
)

# Rest is same ...

Fetching With Custom Video IDs

If you already have specific video identifiers, from_video_ids allows you to target them directly. This is the most efficient way to fetch data when you have an external list of URLs or IDs.

from ytfetcher import YTFetcher

fetcher = YTFetcher.from_video_ids(
    video_ids=['video1', 'video2', 'video3']
)

# Rest is same ...

Fetching With Search Query

The from_search method allows you to discover videos based on a keyword or phrase, similar to using the YouTube search bar. You can control the breadth of the search using the max_results parameter.

from ytfetcher import YTFetcher

# Searches for the top 10 videos matching 'Artificial Intelligence'
fetcher = YTFetcher.from_search(
    query="Artificial Intelligence",
    max_results=10
)

YTFetcher Options

YTFetcher provides a simple interface for customizing your fetching process with several optional parameters:

languages: Specify preferred transcript languages (e.g., ["en", "tr"]).
filters: Apply filters to video metadata before transcripts are fetched.
manually_created Fetch only manually created transcripts for more precise transcripts.
proxy_config Provide custom proxy settings for preventing bans.
http_config Define custom http headers.
cache_enabled Enable or disable SQLite transcript cache. Enabled by default.
cache_path Choose where cache file (cache.sqlite3) is stored.
max_concurrent_requests Control how many transcript requests run at the same time.

These options can be passed to any of the fetcher methods (from_channel, from_video_ids, from_playlist_id, or from_search) to tailor the fetching process for your needs. You can use FetchOptions dataclass from ytfetcher.config for easily configure your options.

See below for examples of usages.

Retrieve Different Languages

You can use the languages param to retrieve your desired language. (Default en)

from ytfetcher.config import FetchOptions

options = FetchOptions(
    languages=['tr', 'en']
)

fetcher = YTFetcher.from_video_ids(video_ids=video_ids, options=options)

Also here's a quick CLI command for languages param.

ytfetcher channel TheOffice -m 50 -f csv --languages tr en

ytfetcher first tries to fetch the Turkish transcript. If it's not available, it falls back to English.

Controlling Transcript Concurrency

By default, ytfetcher fetches up to 20 transcripts concurrently. You can lower this value for slower networks or stricter rate limits, or raise it when your network and proxy setup can handle more parallel requests.

from ytfetcher import YTFetcher
from ytfetcher.config import FetchOptions

options = FetchOptions(
    max_concurrent_requests=10
)

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=50,
    options=options
)

The same setting is available in the CLI with --max-concurrency:

ytfetcher channel TheOffice -m 50 -f json --max-concurrency 10

Filtering

ytfetcher allows you to filter videos before fetching transcripts, which helps you focus on specific content and save processing time. Filters are applied to video metadata (duration, view count, title) and work with all fetcher methods.

Available Filter Functions

The following filter functions are available in ytfetcher.filters:

min_duration(sec: float) - Filter videos with duration greater than or equal to specified seconds
max_duration(sec: float) - Filter videos with duration less than or equal to specified seconds
min_views(n: int) - Filter videos with view count greater than or equal to specified number
max_views(n: int) - Filter videos with view count less than or equal to specified number
filter_by_title(search_query: str) - Filter videos whose title contains the search query (case-insensitive)

Using Filters in Python API

Pass a list of filter functions to the filters parameter when creating a fetcher:

from ytfetcher import YTFetcher
from ytfetcher.config import FetchOptions
from ytfetcher.filters import min_duration, min_views, filter_by_title

options = FetchOptions(
    filters=[
        min_views(5000),
        min_duration(600),  # At least 10 minutes
        filter_by_title("tutorial")
    ]
)

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=50,
    options=options
)

Using Filters in CLI

You can use filter arguments directly in the CLI:

# Filter by minimum views
ytfetcher channel TheOffice -m 50 -f json --min-views 1000

# Filter by minimum duration (in seconds)
ytfetcher channel TheOffice -m 50 -f csv --min-duration 300

# Filter by title substring
ytfetcher channel TheOffice -m 50 -f json --includes-title "episode"

# Combine multiple filters
ytfetcher channel TheOffice -m 50 -f json --min-views 1000 --min-duration 300 --includes-title "tutorial"

Converting Fetch Results to Rows

If you want a flat, row-based structure for ML workflows (Pandas, HuggingFace datasets, JSON/Parquet), you can use the helper in ytfetcher.utils to join transcript segments. It accepts any fetch result returned by the public API, including ChannelData, VideoTranscript, VideoComments, and DLSnippet lists.

from ytfetcher import YTFetcher
from ytfetcher.utils import channel_data_to_rows

fetcher = YTFetcher.from_channel(channel_handle="TheOffice", max_results=2)
channel_data = fetcher.fetch_with_comments(max_comments=5)

rows = channel_data_to_rows(channel_data, include_comments=True)

When comments are available, pass include_comments=True to include comment text in the output rows.

SQLite Cache

ytfetcher now uses a local SQLite cache for transcripts. This significantly speeds up repeated fetches by reusing transcripts that were already fetched with the same transcript options.

Python API cache options

sfrom ytfetcher import YTFetcher
from ytfetcher.config import FetchOptions

options = FetchOptions(
    cache_enabled=True,
    cache_path="./.ytfetcher_cache"
)

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=20,
    options=options,
)

Disable cache when needed:

from ytfetcher.config import FetchOptions

options = FetchOptions(cache_enabled=False)

Control cache expiration with TTL (days):

from ytfetcher.config import FetchOptions

# Keep cached transcripts for 3 days
options = FetchOptions(cache_ttl=3)

# Disable expiration entirely
options = FetchOptions(cache_ttl=0)

CLI cache options

Use --no-cache to skip reading/writing cache for a command:

ytfetcher channel TheOffice -m 20 --no-cache -f json

Set a custom cache directory:

ytfetcher channel TheOffice -m 20 --cache-path ./my_cache -f json

Set cache TTL in days (0 disables expiration):

ytfetcher channel TheOffice -m 20 --cache-ttl 3 -f json

Clear cached transcripts:

ytfetcher cache --clean

Or clear a custom cache path:

ytfetcher cache --clean --cache-path ./my_cache

Failed Transcripts & Retry Behavior

ytfetcher keeps transcript failures in a structured list and retries transient failures once automatically.

Transient failures (for example temporary YouTube-side issues) are retried after a short delay.
Permanent failures are tracked and can be inspected after any fetch operation.
When cache is enabled, transient failures are not persisted as permanent cache failures.

from ytfetcher import YTFetcher

fetcher = YTFetcher.from_channel(channel_handle="TheOffice", max_results=20)
results = fetcher.fetch_youtube_data()

failed = fetcher.get_failed_transcripts()
for item in failed:
    print(item.video_id, item.reason, item.message)

Use this when you want to audit gaps in your dataset and decide whether to re-run or skip problematic videos.

Fetching Only Manually Created Transcripts

ytfetcher allows you to fetch only manually created transcripts from a channel which allows you to get more precise transcripts.

from ytfetcher import YTFetcher
from ytfetcher.config import FetchOptions

options = FetchOptions(
    manually_created=True
)
fetcher = YTFetcher.from_channel(channel_handle="TEDx", options=options)

You can also easily enable this feature with --manually-created argument in CLI.

ytfetcher channel TEDx -f csv --manually-created

Exporting

Use the exporter classes to export ChannelData or any other supported fetch result in csv, json, or txt.

from ytfetcher.services import JSONExporter # OR you can import other exporters: TXTExporter, CSVExporter

channel_data = fetcher.fetch_youtube_data()

exporter = JSONExporter(
    channel_data=channel_data,
    allowed_metadata_list=['title'],   # You can customize this
    timing=True,                       # Include transcript start/duration
    filename='my_export',              # Base filename
    output_dir='./exports'             # Optional output directory
)

exporter.write()

Exporting With CLI

You can also specify arguments when exporting which allows you to decide whether to exclude timings and choose desired metadata.

ytfetcher channel TheOffice -m 20 -f json --no-timing --metadata title description

This command will exclude timings from transcripts and keep only title and description as metadata.

Fetching Comments

ytfetcher allows you fetch comments in bulk with additional metadata and transcripts or just comments alone.

Performance: Comment fetching is a resource-intensive process. The speed of extraction depends significantly on the user's internet connection and the total volume of comments being retrieved.

Fetch Comments With Transcripts And Metadata

To fetch comments alongside with transcripts and metadata you can use fetch_with_comments method.

fetcher = YTFetcher.from_channel("TheOffice", max_results=5)

channel_data_with_comments = fetcher.fetch_with_comments(max_comments=10)

This will simply fetch top 10 comments for every video alongside with transcript data.

Here's an example structure:

[
    ChannelData(
        video_id='id1',
        transcripts=list[Transcript(...)],
        metadata=DLSnippet(...),
        comments=list[Comment(
            text='Comment one.',
            like_count=20,
            author='@author',
            time_text='8 days ago'
        )]
    )
]

Fetch Only Comments

To fetch comments without transcripts you can use fetch_comments method.

fetcher = YTFetcher.from_channel("TheOffice", max_results=5)

comments = fetcher.fetch_comments(max_comments=20)

This will return a list of VideoComments objects like this:

[
    VideoComments(
        video_id='id1',
        comments=[
            Comment(
                text='Comment one.',
                like_count=20,
                author='@author',
                time_text='8 days ago'
            )
        ]
    )
]

Fetching Comments With CLI

Fetching comments in ytfetcher with CLI is very easy.

To fetch comments with transcripts you can use the --comments mode. Use --max-comments to choose how many comments to fetch per video:

ytfetcher channel TheOffice -m 20 --comments --max-comments 10 -f json

To fetch only comments you can use the --comments-only mode:

ytfetcher channel TheOffice -m 20 --comments-only --max-comments 10 -f json

Other Methods

You can also fetch only transcript data or metadata with video IDs using fetch_transcripts and fetch_snippets.

Fetch Transcripts

fetcher = YTFetcher.from_channel(channel_handle="TheOffice", max_results=2)
data = fetcher.fetch_transcripts()

print(data)

fetch_transcripts() returns list[VideoTranscript]. Each item contains the video_id and that video's transcript segments.

Fetch Snippets

data = fetcher.fetch_snippets()
print(data)

fetch_snippets() returns list[DLSnippet] with video metadata only.

Proxy Configuration

YTFetcher supports proxy usage for fetching YouTube transcripts:

from ytfetcher import YTFetcher
from ytfetcher.config import GenericProxyConfig, WebshareProxyConfig, FetchOptions

options = FetchOptions(
    proxy_config=GenericProxyConfig() | WebshareProxyConfig()
)

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=3,
    options=options
)

Advanced HTTP Configuration (Optional)

YTfetcher already uses custom headers for mimic real browser behavior but if you want to change it, you can use a custom HTTPConfig class.

from ytfetcher import YTFetcher
from ytfetcher.config import HTTPConfig, FetchOptions

custom_config = HTTPConfig(
    headers={"User-Agent": "ytfetcher/1.0"}
)

options = FetchOptions(
    http_config=custom_config
)

fetcher = YTFetcher.from_channel(
    channel_handle="TheOffice",
    max_results=10,
    options=options
)

CLI (Advanced)

CLI Overview

YTFetcher comes with a simple CLI so you can fetch data directly from your terminal.

ytfetcher -h

usage: ytfetcher [-h] {channel,playlist,video,search} ...

Fetch YouTube transcripts for a channel

positional arguments:
  {channel,playlist,video,search}
    channel        Fetch data from channel handle with max_results.
    playlist    Fetch data from a specific playlist id.
    video      Fetch data from your custom video ids.
    search     Fetch data from youtube with search query. 

options:
  -h, --help            show this help message and exit

Basic Usage

ytfetcher channel <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>

Fetching Different Channel Tabs (Videos / Shorts / Streams)

Use --tab to choose which channel feed should be fetched.

# Default: videos
ytfetcher channel TheOffice -m 20 --tab videos -f json

# Fetch from the Shorts tab
ytfetcher channel TheOffice -m 20 --tab shorts -f json

# Fetch from the Live/Streams tab
ytfetcher channel TheOffice -m 20 --tab streams -f json

Fetching by Video IDs

ytfetcher video video_id1 video_id2 ... -f json

Fetching From Playlist Id

ytfetcher playlist playlistid123 -f csv -m 25

Fetching with Search Method

ytfetcher search "AI Getting Jobs" -f json -m 25

Using Webshare Proxy

ytfetcher channel <CHANNEL_HANDLE> -f json --webshare-proxy-username "<USERNAME>" --webshare-proxy-password "<PASSWORD>"

Using Custom Proxy

ytfetcher channel <CHANNEL_HANDLE> -f json --http-proxy "http://user:pass@host:port" --https-proxy "https://user:pass@host:port"

Controlling Transcript Concurrency

ytfetcher channel TheOffice -m 50 -f json --max-concurrency 10

Use --max-concurrency to control how many transcript requests run at the same time. The default is 20.

Docker Quick Start

The recommended way to run or develop YTFetcher is using Docker to ensure a clean, stable environment without needing local Python or dependency management.

docker-compose build

Use docker-compose run to execute your desired command inside the container.

docker-compose run ytfetcher poetry run ytfetcher channel TheOffice -m 20 -f json

Contributing

git clone https://github.com/kaya70875/ytfetcher.git
cd ytfetcher
poetry install

Running Tests

poetry run pytest

Running Type Check

You should be passing all type checks to contribute ytfetcher.

poetry run mypy ytfetcher

Related Projects

youtube-transcript-api

License

This project is licensed under the MIT License — see the LICENSE file for details.

Contributors

Thanks to everyone who has contributed to ytfetcher ❤️

⭐ If you find this useful, please star the repo or open an issue with feedback!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.4

Jul 4, 2026

2.3.2

Apr 25, 2026

2.3.1

Apr 19, 2026

2.3

Apr 14, 2026

2.2

Mar 8, 2026

2.1

Feb 15, 2026

2.0

Jan 31, 2026

1.5.3

Jan 8, 2026

1.5

Dec 31, 2025

1.4.1

Nov 10, 2025

1.4

Oct 26, 2025

1.3.1

Oct 18, 2025

1.3

Oct 18, 2025

1.2

Oct 11, 2025

1.1

Oct 2, 2025

1.0.1

Oct 1, 2025

1.0

Sep 27, 2025

0.4.1

Aug 13, 2025

0.4.0

Aug 10, 2025

0.3.0

Aug 9, 2025

0.2.1

Aug 8, 2025

0.2.0

Aug 7, 2025

0.1.1

Aug 3, 2025

0.1.0

Aug 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ytfetcher-2.4.tar.gz (40.9 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ytfetcher-2.4-py3-none-any.whl (43.3 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file ytfetcher-2.4.tar.gz.

File metadata

Download URL: ytfetcher-2.4.tar.gz
Upload date: Jul 4, 2026
Size: 40.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.14.0 Windows/11

File hashes

Hashes for ytfetcher-2.4.tar.gz
Algorithm	Hash digest
SHA256	`1956b6f3f5aa2d418ae405214314b06436da956a3946f3be02cf74ff68fb1eef`
MD5	`44000279b9d60d3f05f47763945cc7f0`
BLAKE2b-256	`cd4218d3dcf6d31d49e71371cab6d4a5c70e70dc86e252812224f4def813837e`

See more details on using hashes here.

File details

Details for the file ytfetcher-2.4-py3-none-any.whl.

File metadata

Download URL: ytfetcher-2.4-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 43.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.14.0 Windows/11

File hashes

Hashes for ytfetcher-2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a4ca52ee96f9eb0fe4c955fb1963034e5340348a73b4fb183fa7d89a2bca31a7`
MD5	`a09daeab1b4f71fcde885522356e7b26`
BLAKE2b-256	`aa4a5bfb5c0dc8d62329b3d698af31bfbed6a1aad0f07875816876a05a5e8dd5`

See more details on using hashes here.

ytfetcher 2.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

YTFetcher

📚 Table of Contents

Installation

Quick CLI Usage

Basic Usage (Python API)

Features

Fetching Specific Channel Tabs (Videos / Shorts / Streams)

Using Different Fetchers

Fetching from Playlist ID

Fetching With Custom Video IDs

Fetching With Search Query

YTFetcher Options

Retrieve Different Languages

Controlling Transcript Concurrency

Filtering

Available Filter Functions

Using Filters in Python API

Using Filters in CLI

Converting Fetch Results to Rows

SQLite Cache

Python API cache options

CLI cache options

Failed Transcripts & Retry Behavior

Fetching Only Manually Created Transcripts

Exporting

Exporting With CLI

Fetching Comments

Fetch Comments With Transcripts And Metadata

Fetch Only Comments

Fetching Comments With CLI

Other Methods

Fetch Transcripts

Fetch Snippets

Proxy Configuration

Advanced HTTP Configuration (Optional)

CLI (Advanced)

CLI Overview

Basic Usage

Fetching Different Channel Tabs (Videos / Shorts / Streams)

Fetching by Video IDs

Fetching From Playlist Id

Fetching with Search Method

Using Webshare Proxy

Using Custom Proxy

Controlling Transcript Concurrency

Docker Quick Start

Contributing

Running Tests

Running Type Check

Related Projects

License

Contributors

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes